Monday, 27 August 2018

Autocorrelation 4 – The Autocorrelation function and partial autocorrelation function



In time series analysis, the time spacing or distance is usually measured in equal steps. The values of rk are then plotted against the lag k and this gives us an idea of how the autocorrelation pattern varies with the lag or the ‘distances’ in the time series. This plot is also called correlogram and is very helpful in understanding the time series patterns at different lags or distances. In the case of a random series, the correlation is almost non-existent, thus the value of rk will all be close to zero and are distributed as N (0,1/n) where 0 is the mean and 1/n is the variance.
If however, there is a short term correlation, the value of rk will start close to 1 and decrease to approximately 0 when the number of lags overtakes the length or the range of the given correlation. With periodic data, the frequency and the strength of the period components can be detected.
As an example, the five year data for Apple (AAPL) daily closing share price from 30th July, 2012 to 28th July 2017 was extracted and plotted as shown in figure 1 below.
 
Figure 1: Five year data for AAPL share price close 


Then we calculated the ACF values in excel as follows:
The auto-covariance at lag k, sk was calculated using the numerator of the equation (4, Autocorrelation - 3 )
while that at lag 0, s0 was calculated too using the denominator of equation (4, Autocorrelation - 3).
ACF was calculated as
(For a detailed description of ACF calculation in excel refer to this post)

rk = sk/ s0
The values are tabulated as below and also a corresponding correlogram has been plotted for the Autocorrelation values against the Lag values from 1 through 25. As can be seen from the plot, the values of ACF slowly descend towards zero if n=k is large enough, which is typical of an autoregressive process. 


                                              Table 1: AAPL ACF values for lags 1 to 25

 
Lag(k=1 to 25)
ACF(rk)
1
0.991462
2
0.98206
3
0.973712
4
0.965045
5
0.954493
6
0.944653
7
0.934148
8
0.923317
9
0.913601
10
0.904024
11
0.893899
12
0.885732
13
0.877238
14
0.866827
15
0.855342
16
0.844666
17
0.834739
18
0.824261
19
0.814092
20
0.802955
21
0.790812
22
0.778749
23
0.766661
24
0.755101
25
0.743932




Figure 2: Plot of the ACF function  of AAPL data in figure 1

It should be noted that real world problems are more complex than encountered theoretically and often they involve interactions between many variables. Thus it is worthwhile observing relationships between two variables keeping the third variable constant.
The formula for simple correlation is
 



 ..............................................(5)

Where SXX  = Cov(X,X) = Var(X)
and SYY = Cov(Y,Y) = Var(Y)
If there is a third variable Z for which we want to control that is for which we want Z to be constant, then we can adjust (5) as follows:

............................................... (6)

For partial autocorrelation with lags 1 and 3 and the effect of autocorrelation at lag 2 held constant, an equivalent formula is:



......................................................... (7)

In case of autocorrelation analysis, the calculation of partial correlation enables us to remove the effects of all the intermediate lags or controlled for and thus we can focus on the individual lag relationships. For the same data of AAPL share prices as mentioned above, we have calculated (For detailed description of the calculation, refer to this post) the partial autocorrelations using a regression method for the lags from 1 through 10 and the values presented below. 

Table 2: PACF for AAPL data for lags from 1 to 10 


Lag
PACF
1
0.967818
2
-0.05854
3
0.14758
4
0.121329
5
-0.11486
6
0.040423
7
-0.27554
8
0.172902
9
-0.03787
10
0.035132
 
 

 The above values are plotted on a correlagram as follows:
 


As we can see from the above figure, the PACF values are greatest at the first lag, but they immediately drop at the very second lag and continue a subdued pattern then on. This is due to the fact that the first lag is the very next lag and the ACF and PACF are almost same for the k=1 and as the value k increases, the autocorrelation between the two series of values lagged by k lags is composed of the effect of the intermediate lags. When that effect is controlled for or removed, the result is a low level of correlation as is evident from the successive PACF values at respective lags from 2 to 10. This pattern is typical of AR(1) processes (visit this link for description of AR (p) processes).
 

It is also possible that the overall pattern of stock prices shows a steady increase over time in which case the correlograms depicted above will not tend to zero as depicted. In this case the series is described as non-stationary. Before carrying out an analysis for such data, the trend effect should be attempted to be removed. In this process, fitting trend curve such as a best fit straight line is fitted to the original data and then the values for this trend curve are subtracted from the original data at lags 1, 2, 3, …prior to analysing the data. Also, outliers if any are also removed. After, the changes have been made, data refined and the correlograms plotted, then the analysis should be attempted for interpretation of emergent patterns. It might be that more than one process is responsible for a given observed pattern, however, this may still be useful in estimation of missing data or predicting data beyond an observed range. This application is further discussed in the blog on Time Series Analysis.