Monday, 7 August 2017

Time Series Analysis 2 - Underlying concepts: From Correlation to Auto-correlation

     Hi folks I am back with more on time series! In the last post we got a feel of what a time series is. In the process we discovered that the time series applications are based on certain underlying constructs and concepts. One of the basic underlying concept is correlation. We shall introduce correlation in this post, however from a time series view point we are more interested in how correlation translates and transforms into auto-correlation. 
     Correlation as a term has statistical connotations and usually understood to mean association between variables. In specific terms it relates to a measure of similarity between two or more paired sets of data or variables. Correlation does not necessarily imply causation, though it might suggest a possibility of a causal relationship. Out of the two variables, one is usually termed as an Independent variable, while the other is termed as dependent variable, however, this does not imply causation as well. It is just the way the co-variation is being examined. 
    A measure of a degree to which the two (or more) variables are correlated is termed as a ‘correlation coefficient’ which is a statistic measured through data and typically ranges from -1 to +1, where 0 is indicative of no correlation, +1 indicates perfect positive correlation, and -1 indicates perfect negative or inverse correlation. 
    The most commonly used correlation coefficient is Karl Pearson’s Pearson or Product moment correlation coefficient. This is a measure of the linear association that is based on the assumption that the data is drawn from a Bi-variate Normal population, which is a normal distribution in which the two variables are independently distributed with the same mean (usually 0) and standard deviations σx and σy

The joint probability of x and y is the product of their Normal probability distribution functions and is given by:

f(x,y) = f(x)(y) = (1/2πσxσy)*e-t/2 

where t = {(x22) + y22)}

     We will cover this in greater detail in subsequent posts, however for now let us focus on correlation itself.
     Pearson’s coefficient can lead to misleading results if based on actual nature of association especially if it is non-linear and also if the data includes outliers. There are certain measures which are more robust than Pearson’s in which the data is either measured or treated as ordinal and ranked. Two widely used coefficients of rank correlation are Spearman’s and Kendall’s. 

     In this context, major extension of correlation techniques that is applied to data recorded in series especially time series and spatial series (sorted by distance band). Unlike standard correlation with two variables, only a single variable is analysed, however in this case comparing pairs of  values separated by an interval of time  or distance band also known as a lag. This enables patterns of dependency in time and/or space which can be studied and help us in developing models where the common assumption of the independence of the observations does not hold.

     The population auto-correlation coefficient at lag k, ρk is then calculated as the ratio of the autocovariance to the auto variance as follows:

ρk = cov(xt, xt-k )/var(xt)
= ϒ(k)/ ϒ(0)

     Where ϒ(0) is the auto-covariance at lag 0. If we have sufficient data, the calculation is symmetric for the series, such that ϒ(k) = ϒ(-k), therefore ρk = ρ(-k)
     As with the product moment correlation coefficient (r ),  ρk has a range [-1,1]  with the mid value, 0 being the indicator of absence of auto-correlation.

     If the lagged variables are independent then ρk =0, but a zero value from sample data does not guarantee that the variables are independent. 

In order to further understand the concept of autocorrelation, we need to study it in its two major forms 1) Temporal (Time series) autocorrelation and Spatial (distance band) autocorrelation.  

In the next post we will discuss the temporal auto correlation that is the correlation based on time series data. 

Till then happy STAT-ing😊




No comments:

Post a Comment