Outlier Risk, Part II

Published 11/11/2021, 12:21 AM
Updated 07/09/2023, 06:31 AM

In the first article in this series, we looked at a basic technique for identifying outliers–extreme data points–in a time series. The tool of choice: slicing the numbers into quartiles and using the interquartile range (IQR) to define “normal.” It’s a useful starting point, but it may not always suffice. Fortunately, there are alternative methodologies that can be run for additional perspective. For comparison, let’s focus on one and review how outlier risk stacks up using Z-scores.

Before we dive in, a quick recap. Detecting outliers is useful in economics and finance, along with many other disciplines, because it indicates when trends are extreme and therefore at higher risk of reversing. In the previous article we used rolling 1-year returns for the US stock market (S&P 500) and so we’ll continue with this example.

First, a brief definition of Z-scores, a.k.a. standard scores. The basic takeaway: converting data into Z-scores is a process of rescaling the numbers. Simply put, a Z-score tells you how far away any one data point is from the population mean. The divergences are presented as standard deviations.

Transforming data into deviations from the mean via standard deviations (SD) provides an objective yardstick to search for outliers. As with any statistical application, there are specific pros and cons. Leaving that aside for now, basic statistics tells us that 1SD captures roughly 68% of the variation, 2SD represents ~95% and 3SD is 99.7% or higher. In many applications 2SD is a common line for deciding if a data point is “normal” or not.

Analyzing data through the lens of Z-scores takes away quite a lot of the ambiguity and subjectivity for deciding what constitutes “extreme.” Yes, IQR offers this lens too, but the process is simply dividing up the results by using the full range of the numbers as a guide. Z-scores do the same but adds an additional filter—standard deviation vs. raw numbers—and so this method is arguably superior.

At the very least, Z-scores provide a check on IQR results. If both methods align, that’s a much stronger signal than relying on either one in isolation. On the other hand, if there’s a conflict, that may be an indication that you need to go deeper in your outlier research.

As a simple example, let’s review S&P 500’s rolling one-year returns in Z-scores.

S&P 500 Rolling 1-Yr Return In Z-Scores

As the chart above shows, there are several instances where returns exceed 2SD, on the upside and downside. If we use 2SD as an indication of “extreme,” the Z-score history shows that about 1.5% of the return population (calculated daily) since 1961 are outliers. On the upside, those extreme returns range from about 40% to 75%.

By comparison, the IQR results reflect a range of 0% to 19%, which means that “extreme” via IQR on the upside is defined as returns above 19%—a much lower bar vs. the Z-score results we’re using. (Note that the current 1-year return is roughly +32%, as of Nov. 9, 2021.)

Which method is right? Or wrong? Neither. We’re simply running statistical tests and beauty (and statistical truth) is in the eye of the beholder. With two sets of results in hand, we need to step back and review our assumptions and parameters. Is 2SD appropriate for this Z-score analysis? Is IQR inappropriate? What, exactly, are we trying to analyze/anticipate when looking at 1-year stock market performance?

The point is that there are no simple rules for defining outliers. Much depends on what we’re trying to achieve. This process isn’t just numbers—it’s about us. Two investors running the same analysis may come to two different conclusions due to different expectations, objectives, risk tolerance, etc.

It’s clear that even for this simple example there’s more work to be done. We’ve made a good start with developing basic results, but additional testing is needed to define “outlier” with a higher degree of confidence.

In the next installment we’ll go deeper. As part of the process we’ll also layout some expectation in terms of what we’re trying to achieve. Statistical tools are a means to an end. To avoid spinning our wheels and going down rabbit holes, it’s essential to clearly lay out why we’re running the numbers.

Statistics without guidance is like driving without a destination: you may see some amazing scenery, but if you don’t know where you’re going the trip may bring more confusion than clarity.

Latest comments

Loading next article…
Risk Disclosure: Trading in financial instruments and/or cryptocurrencies involves high risks including the risk of losing some, or all, of your investment amount, and may not be suitable for all investors. Prices of cryptocurrencies are extremely volatile and may be affected by external factors such as financial, regulatory or political events. Trading on margin increases the financial risks.
Before deciding to trade in financial instrument or cryptocurrencies you should be fully informed of the risks and costs associated with trading the financial markets, carefully consider your investment objectives, level of experience, and risk appetite, and seek professional advice where needed.
Fusion Media would like to remind you that the data contained in this website is not necessarily real-time nor accurate. The data and prices on the website are not necessarily provided by any market or exchange, but may be provided by market makers, and so prices may not be accurate and may differ from the actual price at any given market, meaning prices are indicative and not appropriate for trading purposes. Fusion Media and any provider of the data contained in this website will not accept liability for any loss or damage as a result of your trading, or your reliance on the information contained within this website.
It is prohibited to use, store, reproduce, display, modify, transmit or distribute the data contained in this website without the explicit prior written permission of Fusion Media and/or the data provider. All intellectual property rights are reserved by the providers and/or the exchange providing the data contained in this website.
Fusion Media may be compensated by the advertisers that appear on the website, based on your interaction with the advertisements or advertisers.
© 2007-2025 - Fusion Media Limited. All Rights Reserved.