In time
series analysis, the time spacing or distance is usually measured in equal
steps. The values of rk are then plotted against the lag k
and this gives us an idea of how the autocorrelation pattern varies with
the lag or the ‘distances’ in the time series. This plot is also called correlogram
and is very helpful in understanding the time series patterns at different
lags or distances. In the case of a random series, the correlation is almost
non-existent, thus the value of rk will all be close to zero
and are distributed as N (0,1/n) where 0 is the mean and 1/n is the variance.
If however,
there is a short term correlation, the value of rk will start
close to 1 and decrease to approximately 0 when the number of lags overtakes
the length or the range of the given correlation. With periodic data, the
frequency and the strength of the period components can be detected.
As an
example, the five year data for Apple (AAPL) daily closing share price from 30th
July, 2012 to 28th July 2017 was extracted and plotted as shown
in figure 1 below.
Figure 1: Five year data for AAPL share price close
Then we
calculated the ACF values in excel as follows:
The
auto-covariance at lag k, sk was calculated using the
numerator of the equation (4, Autocorrelation - 3 )
while that
at lag 0, s0 was calculated too using the denominator of equation (4, Autocorrelation - 3).
ACF was calculated as
(For a
detailed description of ACF calculation in excel refer to this post)
rk
= sk/
s0
The values are tabulated as below and also a
corresponding correlogram has been plotted for the Autocorrelation values
against the Lag values from 1 through 25. As can be seen from the plot, the
values of ACF slowly descend towards zero if n=k is large enough, which
is typical of an autoregressive process.
Table 1: AAPL ACF values for lags 1 to 25
Lag(k=1 to
25)
|
ACF(rk)
|
1
|
0.991462
|
2
|
0.98206
|
3
|
0.973712
|
4
|
0.965045
|
5
|
0.954493
|
6
|
0.944653
|
7
|
0.934148
|
8
|
0.923317
|
9
|
0.913601
|
10
|
0.904024
|
11
|
0.893899
|
12
|
0.885732
|
13
|
0.877238
|
14
|
0.866827
|
15
|
0.855342
|
16
|
0.844666
|
17
|
0.834739
|
18
|
0.824261
|
19
|
0.814092
|
20
|
0.802955
|
21
|
0.790812
|
22
|
0.778749
|
23
|
0.766661
|
24
|
0.755101
|
25
|
0.743932
|
Figure 2: Plot of the ACF function of AAPL data in figure 1
It should be
noted that real world problems are more complex than encountered theoretically
and often they involve interactions between many variables. Thus it is
worthwhile observing relationships between two variables keeping the third
variable constant.
The formula
for simple correlation is
Where SXX
= Cov(X,X) = Var(X)
and SYY
= Cov(Y,Y) = Var(Y)
If there is a third variable Z for which we
want to control that is for which we want Z to be constant, then we can
adjust (5) as follows:
For partial
autocorrelation with lags 1 and 3 and the effect of autocorrelation at lag 2
held constant, an equivalent formula is:
In case of
autocorrelation analysis, the calculation of partial correlation enables us to
remove the effects of all the intermediate lags or controlled for and thus we
can focus on the individual lag relationships. For the same data of AAPL share
prices as mentioned above, we have calculated (For detailed description of the
calculation, refer to this post) the partial autocorrelations
using a regression method for the lags from 1 through 10 and the values
presented below.
Table 2: PACF for AAPL data for lags from 1 to 10
Lag
|
PACF
|
1
|
0.967818
|
2
|
-0.05854
|
3
|
0.14758
|
4
|
0.121329
|
5
|
-0.11486
|
6
|
0.040423
|
7
|
-0.27554
|
8
|
0.172902
|
9
|
-0.03787
|
10
|
0.035132
|
The above values are plotted on a correlagram
as follows:
As
we can see from the above figure, the PACF values are greatest at the first
lag, but they immediately drop at the very second lag and continue a subdued
pattern then on. This is due to the fact that the first lag is the very next
lag and the ACF and PACF are almost same for the k=1 and as the value k
increases, the autocorrelation between the two series of values lagged by k
lags is composed of the effect of the intermediate lags. When that effect is
controlled for or removed, the result is a low level of correlation as is
evident from the successive PACF values at respective lags from 2 to 10. This
pattern is typical of AR(1) processes (visit this link for description of AR (p) processes).
It
is also possible that the overall pattern of stock prices shows a steady
increase over time in which case the correlograms depicted above will not tend
to zero as depicted. In this case the series is described as non-stationary.
Before carrying out an analysis for such data, the trend effect should be
attempted to be removed. In this process, fitting trend curve such as a best
fit straight line is fitted to the original data and then the values for this
trend curve are subtracted from the original data at lags 1, 2, 3, …prior to analysing
the data. Also, outliers if any are also removed. After, the changes have been
made, data refined and the correlograms plotted, then the analysis should be
attempted for interpretation of emergent patterns. It might be that more than
one process is responsible for a given observed pattern, however, this may
still be useful in estimation of missing data or predicting data beyond an
observed range. This application is further discussed in the blog on Time Series Analysis.
No comments:
Post a Comment