About Volatility
Volatility is one of the most important attributes of any tradeable instrument. It indicates how volatile, or ‘risky’ if you will, the price is. But how do you measure it? Well as a matter of fact there are several ways of doing it, and sometimes people (including myself) confuse them. In this article I will present a few simple functions, as well as a way of verfying the result.
From investopedia: “Volatility is a statistical measure of the dispersion of returns for a given security or market index. In most cases, the higher the volatility, the riskier the security. Volatility is often measured as either the standard deviation or variance between returns from that same security or market index.”
So, to reiterate: “Volatility is OFTEN measured as either the standard deviation of, or variance between returns” (my emphasis). There are other ways but the standard deviation of returns is the most common one. And standard deviation is the square root of the variance. So from now on in this article: volatility = stddev(log returns).
Calculating this is easy using numerical libraries like numpy, but I will also show a manual way of doing the same.
The first solution will be using python and pandas, so a basic understanding of this library is required.
import pandas as pd import numpy as np
Assume data in a pd.DataFrame(), formatted like this:
Date | Symbol | Date | Time | Open | High | Low | Close | Volume |
2021-05-12 15:30:00 | AAPL | 20210512 | 15:30:00 | 123.30 | 124.64 | 123.30 | 123.93 | 51864 |
---|---|---|---|---|---|---|---|---|
2021-05-12 15:40:00 | AAPL | 20210512 | 15:40:00 | 123.94 | 124.47 | 123.63 | 124.30 | 32709 |
2021-05-12 15:50:00 | AAPL | 20210512 | 15:50:00 | 124.30 | 124.48 | 123.86 | 123.98 | 25105 |
2021-05-12 16:00:00 | AAPL | 20210512 | 16:00:00 | 123.98 | 124.24 | 123.63 | 124.13 | 25553 |
2021-05-12 16:10:00 | AAPL | 20210512 | 16:10:00 | 124.12 | 124.13 | 123.58 | 123.72 | 22831 |
… | … | … | … | … | … | … | … | … |
2022-05-11 21:10:00 | AAPL | 20220511 | 21:10:00 | 146.60 | 146.92 | 146.06 | 146.50 | 30300 |
2022-05-11 21:20:00 | AAPL | 20220511 | 21:20:00 | 146.50 | 146.77 | 146.13 | 146.56 | 27135 |
2022-05-11 21:30:00 | AAPL | 20220511 | 21:30:00 | 146.57 | 146.94 | 146.14 | 146.29 | 32458 |
2022-05-11 21:40:00 | AAPL | 20220511 | 21:40:00 | 146.28 | 146.63 | 145.92 | 145.95 | 45101 |
2022-05-11 21:50:00 | AAPL | 20220511 | 21:50:00 | 145.94 | 146.65 | 145.81 | 146.59 | 71507 |
As you can see, the data is in 10.-minute bars. This will add an extra level of complexity, but as I will explain later doesn’t change the calculations, just the scaling.
However a number of questions arise. In this case, we have 450 rows of 10-minute bars, representing 75 hours of price movements. That is just over 9 trading days. Is this enough to calculate the volatility? What can we do with the answer? What if we are interested in daily volatility, can we use this data then?
Well yes, and no. The thumb rule for calculation is that the volatility is proportional to the square root of time. You can calculate the daily volatility, by multiplying the 10-minute volatilty with the square root of 48 (8 hours of 6 bars). You could also calculate the monthly volaitlity by multiplying with the square root of 1440 (30 days of 8 hours of 6 bars). It also works the other way around by dividing with the same constant. This is explained well in this article: HOW TO CONVERT VOLATILITY FROM ANNUAL TO DAILY, WEEKLY OR MONTHLY?
However this is a calculation that comes with its own problems. If you want daily volatility, I recommend using daily bars.
Annual Volatility may be the most common figure for comparisons, but what about Weekly volatility and Monthly volatility? Well I suggest you “compare apples to apples and oranges to oranges”. Keep them apart.
The manual method
We will use the ‘Close’ column, and the first step is to calculate the 1-period log return, and then the mean of that.
df['log_rtn'] = np.log(df['Close']).diff() mean_log_rtn = df['log_rtn'].mean()
Now create another column, with the “squared deviation from mean” of each return
df['squared_dev'] = np.square(df['log_rtn1'] - mean_log_rtn)
Then you calculate the variance:
variance = df['squared_dev'].sum() / (len(df)-1)
And the standard deviation
stddev = np.sqrt(variance)
And then convert it to whichever period you want. Normally you talk about annual vola but as I said earlier, this is just the scale you use for comparison.
# annualized daily standard deviation annual_std = stddev * (252 * 6 * 8) ** 0.5
The numpy method
There is however a much easier way of calculating exactly the same, namely using:
stddev = np.std(df['log_rtn'])
Now we have tha 1-period volatility. If we want to annualize it, we do this:
# annualized daily standard deviation annual_std = stddev * (252 * 6 * 8) ** 0.5
The latter needs some explanation: We are using 10-minute bars, and there are 6 of those per hour, and 8 trading hours a day. We assume 252 trading days per year. Thereby the 252 * 6 * 8.
And ** 0.5 (meaning ^0.5) is just a fancy way of saying SQRT()