时间序列分析及VAR模型

Lecture 6

6. Time series analysis: Multivariate models

6.1Learning outcomes

?Vector autoregression (VAR)

?Cointegration

?Vector error correction model (VECM)

?Application: pairs trading

6.2Vector autoregression (VAR)向量自回归

The classical linear regression model assumes strict exogeneity; hence, there is no serial correlation between error terms and any realisation of any independent variable (lead or lag). As we discovered, serial correlation (or autocorrelation) is very common in financial time series and panel data. Furthermore, we assumed a pre-defined relation of causality: explanatory variable affect the dependent variable?

传统的线性回归模型假设严格的外主性，误差项与可实现的独立变量之间没有序列相关性。金融时间序列及面板数据往往都有很强的自相关性，假定解释变量影响因变量。

We now relax bo什］assumptions using a VAR model. VAR models can be regarded as a generalisation of AR(p) processes by adding additional time series. Hence, we enter the field of multivariate time series analysis. VAR模型可以'"l作是在一般的自回归过程中加入时间序列。

Lefs look at a standard AR(p) process for hvo variables (y( and xj?

(1)%= Ql + 琅］仇『一 +仏

(2)x t = a2 + - + ￡2t

The next step is to allow that lagged values of xt can affect y( and vice versa. This means that we obtain a system of equations for two dependent variables(y(and xj?Both dependent variables are influenced by past realisations of y(and x t. By doing that, we violate strict exogeneity (see Lecture 2); however, we can use a more relaxed concept, namely weak exogeneity?As we use lagged values of bodi dependent variables, we can argue that these lagged values are known to us, as we observed them in the previous period? We call these variables predetermined? Predetermined (lagged) variables fulfil weak exogeneity in the sense that they have to be uncorrelated with the contemporaneoiis error term in t? We can still use OLS to estimate the following system of equations, which is called a VAR in reduced form.

(3)+y 仇1化_丫+sr=i ^12 +￡it

(4)X t = a2+2X1021”—, + _i + f2t

Tlie beauty of tliis model is that we don't need to predefine whether x or y are endogenous (the dependent variable). In fact, we can test whether x (y) is endogenous or exogenous using Granger causality tests?The idea of Granger causality is that past observations (lagged dependent variables) can influence cuiTent observations — but not vice versa? So the idea is rather simple: the past affects the present, and the present does not affect the past. STATA provides Granger causality tests after conducting a VAR analysis, which is based on testing the joint hypothesis that past realisations do not Granger cause the present realisation of the dependent variable.

In many applications, VAR models make a lot of sense, as a clear direction of causality cannot be predefined?For instance, there is a substantial literature on the benefits of internationalisation (e.g. entering foreign market through cross-border M&A). There is evidence that multinationals outperform local peers due to the benefits of operating in many countries. At the same time, we know that high-performing companies are more likely to enter foreign markets due to their ownership specific advantages. This argument is based on the Resource-based View and the OLS framework developed by Dunning and Rugman (Reading School of International Business). The VAR model allows you to incorporate both effects: in fact you can test whether performance drives internationalisation or internationalisation drives performance.

Before you start using a VAR modeL you have to make sure that the time series are stationary. So the first step is to check whether the time series is sStionjry using Dickey-Fuller tests and KPSS tests. The second step is to specify the optimal lag length (p) of the model. This is done by comparing different model specifications using information criteria. Apart from using Akaike (AIC) and Bayesian Schwarz (BIC), the Hannan-Quinn (HQIC) is commonly used. Most applied econometricians favour the Hannan-Quinn (HQIC) criterion? STATA will help you to make a good choice? After specifying your model, you need to check stability conditions. The coefficient matrix of the reduced form VAR has to ensure that the iteration sequence converges to a long-term value? STATA will help you in checking stability.

To be precise, you need to show that the eigenvalues of the coefficient matrix lie within the unit circle. The reason behind it can be only understood when you understand the method of diagonalizing a matrix.

VAR models offer ano 什nice feature: impulse response functions? VAR models capture the dynamics of two (or more) stationary time series; hence, we can assess the dynamic impact of a marginal change of one variable on another? The standard OLS regression provides coefficients, and coefficients refer to the partial impact of an explanatory variable on the dependent variable ? In the case of VAR models, the relationship becomes dynamic, as a change of one variable (say x) in t can affect x and y in t+l? The impact on x and y in t+1 in turn affects x and y in t+2 and so on until the impact dies out. Impulse response functions are very useful in illustrating the short-term dynamics in a model.

Lefs look at an example to see how VAR modelling works. In Lecture 5, we tried very hard to understand gold prices? We extend our univariate model by exploring the relationships between

gold and silver prices?Linking two (similar) assets or securities is a very common trading strategy, which is called pairs-tradin^?

Before we do any sophisticated modelling, it is always beneficial to look at some line charts. Figure 1 shows the indexed time series of nominal gold and silver prices from 1900 to 2010. Figure 1: Nominal gold and silver prices, indexed^ 1900-2010

We can see that there is a certain degree of co-movement, which we might be able to exploit for our trading strategy. Before we can use VAR. we need to ensure that both time series are stationary. It is obvious from Figure 1 that gold and silver prices are not stationary. However, after taking a first-difference we can show that price changes are stationary. So both time series are 1(1).

The next step is to determine the optimal lag length using information criteria? Table 1 shows different specifications using the varsoc command?

Table 1: Determining the optimal lag length using information criteria

Selection-order criteria

Sample: 1906 - 2010 Number of obs = 105

Endogenous: return.g return_s Exogenous: _cons

Based on the AIC and HQIC, two lags are optimal; however, the (S)BIC prefers only one lag. I would prefer HQIC and try two lags first. If the second lag does not exhibit significant coefficient, we could try to reduce the lag leng什】in line with (S)BIC?

We run a VAR with two lags to explain current price changes in gold and silver? Table 2 provides the OLS estimates.

Table 2: VAR model with two lags

Vector autoregression

Sample: 1903 - 2010

Log likelihood = 126.0166N0? of obs AIC

HQIC

SBIC

= 108 =?2?

148455 =?2?

04776 = ?丄?

90Oil

FPE

Det(Sigma_ml) Equation = ?0004

=?00OM323

Parms RMSE R-sq chi2P>chi2

return.g5

?1269270.242534.57860.0000

return_s5?1965690.130616.227630?0027

coef.Std? Err. 2 P>|Z|[95% Conf.Interval]

return_g

return.g

Ll??4864107?丄2298563?960.000?2453633?7274581

L2?-?0139809?122817-0.110? 909-.2546979?2267361

return.s

L：L-?0068126?0805903-0.080.933-?1647668?1511415

L2?■?207786?0807151-2.570.010-.3659847■?0495874

_cons?0277213?0124857 2.22O?O26?0032497?0521929

return_s return^g

Ll??3143786?1904648 1.650.099-.0589257?6876828

L2??1085011?19020380.570.568-?2642915?4812937

return.s

Ll??1094293?12480830?880.381-.1351905?354049

L2?-?3201805?1250015-2.560.010-.5651789■?0751821

_cons?024511.0193363 1.270? 205-.0133875?0624095

We see that silver prices (lag 2) affect current gold prices, and we can establish autocorrelation in both time series? To test whether gold Granger causes silver or vice versa, we run Granger causality tests reported in Table 3.

Table 3: Granger causality tests

Hence, we confirm that past changes in silver prices can predict future gold price changes. This is very interesting, as it can be used to develop a trading strategy. Finally, we need to show that the VAR is stable (see Table 4).

Table 4: Stability condition of the VAR

Finally, we can illustrate the impact of silver price changes on future gold price changes using an impulse response function? Figure 2 shows the impulse response function and confidence intervals derived from bootstrapping. If silver prices increase today by 1%, we should expect a significant decline in gold prices in two years by 0.2%?

Figure 2: Impulse response function

r, return_s, return_g 2?

?.4T

0 2 4 6 8

step

95% Cl ---------------- impulse response function (irf)

Graphs by irfname, impulse variable, and response variable

6.3Cointegration

When we explore Figure 1 a bit more carefully, we can see that silver and gold prices exhibit a certain degree of co-movement. We could almost argue that they share a common stochastic trend? The limitation of ARIMA and VAR models is that they can be only used if the time series are stationary. In our case, we had to first-difference your time series to ensure stationarity. First-differencing eliminates a lot of information in the time series? Is there no better way to analyse gold and silver prices.

Long before the development of multivariate time series econometrics, people realised that gold and silver seem to have a common movement around a long-term equilibrium (goldsilver price ratio). Moreover, the idea of equilibrium conditions in economics and the availability of macroeconomic time series led to the development of cointegration analysis.

The idea is very simple. Even if two (or more) time series are non-stationary and hence have stochastic trends, they might be still driven by the same underlying factors that lead to their stochastic behaviour. Therefore, we analyse the time series in levels and see whether we can find a long-term equilibrium 一a so-called cointegrating vector.

Before we explore the Johansen procedure, lefs look at the gold-silver ratio over time shown in Figure 3.

Figure 3: The gold-silver ratio, 1900-2010

The ratio looks like a meanprocess^ thus, in the long run it tends to go back to its long-term equilibrium (mean). Based on the ratio, we could argue that gold seems to be overvalued compared to silver at the moment.

Of course, taking the ratio suggests a very simple cointegrating vector一in fact we assume a one-to-one relationship? Before we can use the Johansen procedure, we have to make sure that the time series have the same order of integration I(p)? We already know that gold and silver prices are both 1(1) time series? Table 5 shows the results of the Johansen test for cointegration. In line with the VAR model, we use two lags.

Table 5: Johansen test

? johans price.g pricers, lags(2)

Johansen-Juselius cointegration rank test Sample: 1901 to 2010

Number of obs = 109

Hl：

HO：Max-lambda Trace

Eigenvalues rank<=(r)statistics statistics

(lambda)r(rank<=(r+l))(rank<=(p=2))

?117631260JL3? 64083122?240287

?0758622118?59945658.5994565

Osterwald-Lenum Critical values (95% interval):

Table/Case: 1*

(assumption: intercept in CE)

HO：Max-lambda Trace

015.6719.96

19.249.24

Table/Case: 1

(assumption: intercept in VAR)

HO：Max-lambda Trace

014.0715.41

13?763?76

Normalized Beta'

price.g price.s vecl -1?549e-07 ?0000176

vec2 4?183e-O7 ?? 00001694

Normalized Alpha

vecl vec2

price_g 69248 ? 321 447701?12

price_S -5468? 4015 14662? 5 54

The null hypotllesis that there is no cointegration (r=0) can be rejected if we use the trace statistic. However, the null hypothesis that we have one cointegrating vector (r=l) cannot be rejected? The problem is that the max-lambda statistic does not support cointegration? I also tried log-prices instead, which is common in analysing gold-silver ratios; however, I don't obtain clear results?

Given the extreme increase in volatility in prices, it might be likely that there are structural breaks in an alleged cointegration vector? Structural breaks are difficult to handle?Another way to look at this problem is to test whether price ratios or log-price ratios are stationary time series? If they are stationary, then the two underlying time series are cointegrated and the ratio indicates the cointegration vector. Again Dickey-Fuller tests cannot reject the null hypothesis: hence, both ratios don't seem to be stationaiy?

6.4 Vector error-correction model (VECM)

The VECM combines VAR and cointegration into one framework. The VAR is extended by including deviations from the long-term equilibrium defined by the cointegration vector ? The coefficient of the deviation from the long-term equilibrium indicates the speed of adjustment back into cquilibrium.

The VECM capture the long-term relationship and the short-term dynamics of two or more time series ? Lefs see how it works in the case of gold and silver prices ? Table 6 reports the VECM specification, which resembles the VAR with two lags. It also contains the CE component; the co-called erro 「correction component that captures the deviation from the long-term equilibrium in the previous period. So the CE is a lagged and hence predetermined variable, as required by OLS and the VAR framework ? Table 6: VECM based on gold and silver prices

vector error-correction model The speed of adjustment is not significant, which undermines the idea of a long-run equilibrium in gold and silver prices as suggested by the literature ?

We can explore structural changes, when we plot the predicted long-term equilibrium over time.

Coef.

Std ? Err.

2 P>|2| [95% conf. Interval]

D_price_g _cel Ll.

-?0048672

?0135324 -0.36 0.719 -.0313903 ?0216559

price_g LD ?

?2714343 ?2384682

1?14 0.255 -.1959548 .7388233 L2D ? ?8565381 .2351755 3?64 0.000 ?3956026 1.317474 price.s

LD ?

?5637784

6?586781 0.09 0.932 -12.34607 13?47363 L2D ? -26.23471 5?670113 -4.63 0.000 -37.34793 -15.1215 _cons

2813?244

472046.2

0.01

0.995

-922380.4

928006?9

D_price_s _cei Ll

?0005916

?0004392 丄?35 0.178 -?0002693 ?0014524 price_g

LD ?

-?0076263 ?0077399 -0.99 0.324 -.0227962 ?0075436 L2D ? .0357519

?007633 4?68 0.000 ?0207914 ?0507123 pricers

LD ?

?2898047 ?2137855 1.36 0.175 -?1292072

?7088166 L2D ? ■l ?064965

?1840335

-5.79 0.000 425664 ■?7042665

_cons

23146?29

15321?09

1.51

0.131

-6882?487

53175?08

D_price_g D_price_s

6 5e+O6 0?2949 42?66373 0.0000 6

49748?2

0?2598

35?79424 0.0000

Sample:

1903 - 2010

Log likelihood = -2911.494 Det(Sigma -.ml) =

8? 93e+2O No. of obs AIC HQIC SBIC

108

54.15731 54?28821 54

?48015

Equation

Parms

RMSE R-sq chi2

P>chi2

Figure 4: Predicted long-term equilibrium based on VECM

It is very obvious that the long-term equilibrium undergoes structural breaks. We could split the time period into a stable and unstable period. Yet the main issue with stiuctural breaks is that they appear to be obvious ex post 一but nearly impossible to predict ex ante?

For instance, if we focus on 1900-1980, we obtain a very strong result that underlines cointegration and adjustments into the long-term equilibrium. Hence, we conclude that the gold-silver ratio is no longer a reliable phenomenon that we could rely on. Nevertheless, we can use the short-term dynamics captured in the VAR to do short-term trading.

6.5Pairs trading ? APPLICATION

The idea of pairs trading is that we trade two similar shares that are driven by similar macroeconomic factors. In our example, we focus on the US steel industry and try to identify a trading strategy based on United States Steel Corporation and Titan International?

First we need to modify the time dimension, as NASDAQ reports the latest share prices first. We selected 5-years of daily closing prices for our analysis?

?Time dimension needs to be modifled

gen t=_n

replace t=1264-t

tsset t ?Line chart twoway (line us_steel t) (line titan t)

Lef s have a quick look at a line chart combining both share prices? Obviously both share

prices are non-stationary, which we should confirm first using Dickey-Fuller tests.

We run Dickey-Fuller tests based on share prices and first-differenced time series? The tests confirm that both time series are 1(1). Hence, we can try to find a cointegration relation following the Johansen procedure? Before we do that, I suggest that we explore a VAR model and determine the optimal lag structure.

?Returns

gen r_us=ln(us_steel)-ln(https://www.360docs.net/doc/1a2622253.html,_steel)

gen r_titan=ln(titan)-ln(Ltitan)

?Dickev-Fuller

dfuller us_steel

dfuller titan

dfuller r_us

dfuller r_titan

? varsoc r_us r_titan

Selection-order criteria

We cannot establish any VAR lag structure, which shows that there is hardly any serial-correlation of returns. This is great news for the Efficient Market Hypothesis 一but bad news for us. When you explore AC and PAC function, you will also discover that univariate analysis won't go anywhere, as autoconelation ill retiuns is not present?

Luckily, we can find cointegration with Titan adjusting back to long-term equilibrium conditions? Based on the VECM, we can predict the levels (share prices) of both stocks. These are one-step ahead forecast.

When should you buy and sell? This also depends on transaction costs? In pairs trading, you go long in one stock and short in the second stock to hedge your position. We need to avoid that we have to trade too frequently to reduce transaction costs.

The following figure cumulates the deviations from the long-term equilibrium and shows periods of over an undervaluation?

sum_us --------------- sumjitan

In practice, you would run different scenarios and determine losses and profits from following different trading rules. Moreover, you need to test your model out-of-sample to ensure that it holds? Currently, we assume that we update the model every day, which is not necessarily the case?

6.6.1 Interpretation of VECM

Interpret the following VECM result ? In particular, discuss whether there is a long-term equilibrium between gold and silver prices and highlight the short-term dynamics.

? vec price.g price.s if year

vector error-correction model Sample: 1903 - 1979 R-sq NO ? of obs AIC HQIC SBIC

77 48?53184 48?69011 48.92754

Log likelihood Det(Sigma_m1) Equation =-1855.476

=2.92e+18

Parms RMSE chi2 P>chi2 D.price_g 6 261962 0.7932 272.3557 0.0000 D price s 6

14526.7

0.6356

123.8518

0.0000

coef.

Std ? Err.

2 P>|2|

[95% conf. Interval]

D_pri"_g

-cel

Ll. ?1035065

?010347

10.00

0.000

?0832269

?1237862 price.g

LD ? l ?091987 ?1449103

7?54 0.000 ?807968 ：L ? 376006

L2D ? ?2523371

? 1618

1.56

0.119

■?064785

?5694592

price s

LD ? -25.73918

5.458489 -4.72 0.000 -3

6.43762 -15?04074 L2D ? -46.29332

5.561025 -8.32 0.000 -57.19272 -35.39391 _cons

20.61365

32031?29

0.00

0? 999

■62759.56

62800?79

D price s

-cel

Ll ? ?0 027341.

?0005738

6.51

0.000

?0026095

?0048587 price_g

LD ? ?0485741. ?0080358 6.04 0.000 ?0328242 ?064324 L2D ? ?0282756

?0089724

3.15

0.002

?0106901 ?0458612

price s

LD ? 444845 ?3026928 -4.77 0.000 -2.038112 -.851578 L2D ? ■I.690503 ?3083788

-5.48 0.000 -2?294914 ■l ?086092 _cons

-571.3938

1776.25

-0.32

0.748

-4052.78

2909?992

Cointegrating equations

Equation Parms

chi 2 P>chi2

_cel

66?84584

0.0000

Identification: beta is exactly identified

Enders, W. (2004) Applied econometric time series. 2nd edition, John Wiley & Sons.