Hi folks I am back with more on time series! In the last post we got a feel of what a time series is. In the process we discovered that the time series applications are based on certain underlying constructs and concepts. One of the basic underlying concept is correlation. We shall introduce correlation in this post, however from a time series view point we are more interested in how correlation translates and transforms into auto-correlation.
Correlation as a term has statistical
connotations and usually understood to mean association between variables. In
specific terms it relates to a measure of similarity between two or more paired
sets of data or variables. Correlation does not necessarily imply causation, though it might
suggest a possibility of a causal relationship. Out of the two variables, one is usually
termed as an Independent variable, while the other is termed as dependent
variable, however, this does not imply causation as well. It is just the way
the co-variation is being examined.
A measure of a degree to which the two (or more)
variables are correlated is termed as a ‘correlation coefficient’ which is a
statistic measured through data and typically ranges from -1 to +1, where 0 is
indicative of no correlation, +1 indicates perfect positive correlation, and -1
indicates perfect negative or inverse correlation.
The most commonly used
correlation coefficient is Karl Pearson’s Pearson or Product moment correlation
coefficient. This is a measure of the linear association that is based on the
assumption that the data is drawn from a Bi-variate Normal population, which is
a normal distribution in which the two variables are independently distributed with
the same mean (usually 0) and standard deviations σx and σy.
The joint probability of x and y is
the product of their Normal probability distribution functions and is given by:
f(x,y) = f(x)(y) = (1/2πσxσy)*e-t/2
where t = {(x2/σ2)
+ y2/σ2)}
We will cover this in greater detail in subsequent posts, however for now let us focus on correlation itself.
Pearson’s coefficient can lead to
misleading results if based on actual nature of association especially if it is
non-linear and also if the data includes outliers. There are certain measures
which are more robust than Pearson’s in which the data is either measured or
treated as ordinal and ranked. Two widely used coefficients of rank correlation
are Spearman’s and Kendall’s.
In this context, major extension of correlation
techniques that is applied to data recorded in series especially time series
and spatial series (sorted by distance band). Unlike standard correlation with
two variables, only a single variable is analysed, however in this case
comparing pairs of values separated by
an interval of time or distance band
also known as a lag. This enables patterns of dependency in time and/or space
which can be studied and help us in developing models where the common
assumption of the independence of the observations does not hold.
The population auto-correlation
coefficient at lag k, ρk is then calculated as the
ratio of the autocovariance to the auto variance as follows:
ρk = cov(xt,
xt-k )/var(xt)
= ϒ(k)/ ϒ(0)
Where ϒ(0) is the auto-covariance
at lag 0. If we have sufficient data, the calculation is symmetric for the
series, such that ϒ(k) = ϒ(-k), therefore ρk = ρ(-k)
As with the product moment correlation
coefficient (r ), ρk has
a range [-1,1] with the mid value, 0
being the indicator of absence of auto-correlation.
If the lagged variables are independent
then ρk =0, but a zero value from sample data does not
guarantee that the variables are independent.
In order to further understand the concept of autocorrelation, we need to study it in its two major forms 1) Temporal (Time series) autocorrelation and Spatial (distance band) autocorrelation.
In the next post we will discuss the temporal auto correlation that is the correlation based on time series data.
Till then happy STAT-ing😊