Wednesday, 5 September 2018

Time Series Analysis 3 - Applications


A time series can be considered as a value set {Xt} representing measurements which are taken at different periods of time t = 1,2,3,4…..n. We can easily appreciate the fact that time itself is a continuous variable. However the measurements for time in many cases are made at specific intervals or points in time and thus they appear as discrete variables. Usually a single variable or value is analysed at each point in time which is known as univariate analysis. In such cases the single variable is analysed even if multiple variables are recorded for that point for instance the different variables might be daily weather conditions at the meteorological station or stock prices or traded volume data on hourly bases. The data are mostly though not always defined for equal time intervals. For instance, hourly, daily, monthly or weekly basis. There is also bivariate or multivariate time series analysis.
As we covered the temporal autocorrelation in the previous post, we discussed an example of a time series namely that of the price of AAPL stock over a period of time. We illustrated that the time series data from such example could be analysed to identify patterns using the ACF and PACF functions represented using correlograms. Herein we take the analysis further and study more ways of examining and analysing patterns in this kind of data. Further from this analysis we arrive at predictive models that are based on a predictive or explanatory function and devise forecasting techniques based on the same. First of all we examine the statistical methods that can be applied to time series data by examining their behaviour over a period of time. We then have a look at time series data with varying periodicities or frequencies. Analysis of such series is particularly useful for prediction of phenomena that show multiple periodicities over a period of time.   

There are many different types of time series such as 1) economic datasets like share prices, GDP or inflation, income datasets 2) Physical time series such as river flow data, meteorological data, pollution monitoring data 3) Marketing time series data such as sales figures, advertising response data, 4) demographic time series like population levels over time, 5) manufacturing data such as process output and control charts 6) binary processes such as digital data sequences in switching systems and data transmission systems) and 7) temporal point processes like 1 D point processes, 2 D point processes and spatial point datasets. In case of some of these data series types, the data measurement can be continuous such as in a barograph measuring air pressure or a data communications channel through which the traffic flows are recorded continuously. Other data could be measured or recorded only at specific points in time such as the daily closing price of a stock on the stock exchange. A considerable proportion of data series techniques address the time series problems pertaining to the latter type which is termed as discrete data series and is usually recorded at fixed intervals. 
There are many different reasons why an analyst would undertake the analysis of a time series data. The reasons include simple descriptive requirements of analysing data in order to identify the main aspects of the data such as means, peaks and troughs in the data or periodicity or critical points of change in trends. Another purpose to analyse time series might be to predict trends in future and yield estimates of the given quantity or phenomena being measured based on historic data. Earlier, prediction or estimation had been a static procedure. However now an increasingly real time predictive modelling are being used to assess the future trends continuously for a given period of time based on current data being generated. The applicability of such modelling is especially more in high pressure or emergency control situations such as disaster management, earthquake or tsunami predictions or infrastructure management like communications and power management etc. as well as in the financial markets. Time series predictions use raw data to make forecasts about phenomena and are useful in shorter term forecasting. However for long term forecasting or for forecasting for data with complex behaviour, an explanatory analysis is used to arrive at accurate forecasts and the data on many variables might be used to provide future values of multiple time series variables underlying a given construct such as GDP which requires measurement of many different variables to arrive at accurate estimates of the same.
Another example is the UK Treasury model which is used for econometric forecasting and as of now uses 30 main equations and about 100 independent variables that are used as input variables to arrive at the model predictions.

Forecasting is dependent on what can be termed as a well behaved data. This amounts to use of historic data and related information to predict future values of data variables. In many cases this can be very effective, but often does not take into account the very unexpected and sudden changes. We need to be cautious of such unexpected changes and account for them in our predictive models.
Any time series almost always has some degree of autocorrelation and the analysis of autocorrelation is usually one of the first tasks carried out after cleansing of data and basic visual inspection though. Apart from this data sometimes also exhibit some level of periodicity and the length and magnitude of such patterns requires closer examination.
This so as no matter how many instances of a predictable time series data patterns one might have observed, there will still be some patterns that might consist of sudden and unexpected changes. This unexpected scenario can be treated as a valuable paradigm contributing to analysis of vulnerable situations where sudden and drastic changes such as major wars, famines or banking crises occur almost out of the blue. It must be noted that while analysing a time series, if at the specific point at which the series is examined does not affect the results, then the series can be said to be stationary. In more formal terms, a stationary time series is one whose joint probability distribution is not affected by a shift in time or in space. This implies that the mean and variance of the data are constant across the time and/space. However, this condition is seldom fully achieved. In cases where the series include trend and/or periodic behaviour it is common for these components to be identified and accounted for or decomposed prior to further analysis. There are many models used in time series analysis, which include simple autoregressive (AR) models, moving average (MA) models and also combined ARMA models. These models assume stationarity. There are more complex models however, such as ARCH and GARCH that allow for heteroskedasticity and are also supported in specialized software packages, such as in econometric modelling.  

Auto Regressive - AR(p) processes



An AR or Auto-regressive model is a model of a stochastic or random walk processes in which the future values of a data series are determined based on the weighted sum of the past values. 
The successive past values have coefficients which are in fact the weights mentioned above and are the actual correlations between the present and the particular past value. Thus Autoregression model is also sometimes referred to as an autocorrelation model.
An auto-regression model is expressed in the form: 


xt   1 xt-1 + …..αpxt-p  + Zt


or
xt   +Zt
Where, the terms α1…αp  are the auto correlation coefficients at lags 1, 2... and p and Zt
 is the residual error term. This error term relates specifically to the current time t.
Thus for a first order process, p=1 and the model obtained is
xt = αxt-1 + Zt……………………………….1
xt-1 = αxt-2 + Zt-1 ……………………………..2

This expression implies that the estimated value of x at time t is determined by the immediately previous value of x or value at time = t-1 multiplied by a correlation coefficient or the extent to which all pairs of the series that are time period 1 lag distant are correlated (or autocorrelatred) plus a residual term, which is an error term at time t.
This is also known as Markov process. Thus a Markov process is a 1st order AR process and can be expressed as AR(1) process.
Thus if α=1, the model states that the next value of x is simply the previous value plus a random error term and thus it is a simple 1 dimensional random walk.
However, if more terms are introduced, the model predicts the value of x at time = t by a weighted sum of the terms plus a random error component

In general an AR(p) process can be expressed as
xt= c + ix t-1+ Zt
Where α1….αp are model parameters or the coefficients, c is a constant and Zt is the error term or the white noise.
Applications of AR models ( see
AR models can be used to help predict earth quakes by analysing the pre earthquake Ionospheric anomalies or to study Volcanic Tremor.

It can also be used to analyze audio/speech recognition based on AR modelling of amplitude modulations see