R言語による多変量時系列分析−Panel Linear Model

R言語による多変量時系列分析。
複数グループ・複数項目の時系列データで時系列間の関係性・影響を明らかにする。

以下、実行可能なR言語ソースコードも用い紹介する。

例：複数時系列間の関係性・影響

農家ごとの農地の肥沃度・人の各仕事の労働量・各肥料の投入量の各時系列、および、生産量の時系列から、肥沃度・各労働量・各肥料量が生産量に与える影響を明らかにする。
国ごとの各政策種類への投資額の時系列、および、GDP から、各政策投資がGDPに与える影響を明らかにする
地域ごとの各キャンペーン種類の投入量の各時系列、および、商品売上の時系列から、各地域の各キャンペーンが売上に与える影響を明らかにする。等。

R Library

複数時系列の時系列分析として、R言語では plm 、 ccgarch 等のパッケージがあるが、今回は plm (Panel Linear Model) を紹介する。
※参考：Panel Data Econometrics in R: The plm Package (PDF)

Panel Data(交差時系列)

計量経済学の領域では、交差系列 (Cross section Data) で時系列 (Time series Data) である系列を
『交差時系列(Panel Data)』と呼ぶ。　
例）47都道府県の人口を時間を追って調べたもの。

交差系列 (Cross section Data)：同一時点での様々なData。例) ある時点で47都道府県の人口、人口密度、男女比などを調べたもの。
時系列 (Time series Data)：同一種類のDataを様々な時点で取ったもの。例) ある都道府県の人口を時間を追って調べたもの。

※参考：Wikipedia: 計量経済学 - Wikipedia、パネルデータの意義とその活用(PDF)

関数

plm package には複数時系列の評価関数として次の4つが用意されている。

Functions:

plm: estimation of the basic panel models, i.e. within, between and random effect models. Models are estimated using the lm function to transformed data,
pvcm: estimation of models with variable coecients,
pgmm: estimation of generalized method of moments models,
pggls: estimation of general feasible generalized least squares models

これらはRの線形回帰関数 lm() と同じ引数の形をとり、1引数目に formula、2引数目にdata をとる。また、Option として、次の３つを指定できる。

Options:

index: this argument enables the estimation functions to identify the structure of the data, i.e. the individual and the time period for each observation,
effect: the kind of effects to include in the model, i.e. individual effects, time effects or both,
model: the kind of model to be estimated, most of the time a model with xed effects or a model with random effects.

Results:

上記各関数の結果は関数名と同じクラスのオブジェクトに保存され、これらのクラスは panelmodel オブジェクトを継承している。これらのオブジェクトは以下を含んでいる。

coefficients
residuals
fitted.values
vcov
df.residual
call

データ

Rの組み込みデータ「Produc：米国の48州の1970-1986のパネルデータ」を用いる

library("plm")
data("Produc", package = "plm")
head(Produc, 10)

Produc データ (10行目まで抜粋)：

> head(Produc, 10)
     state year     pcap     hwy   water    util       pc   gsp    emp unemp
1  ALABAMA 1970 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5   4.7
2  ALABAMA 1971 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9   5.2
3  ALABAMA 1972 15972.41 7765.42 1764.75 6442.23 38670.30 31303 1072.3   4.7
4  ALABAMA 1973 16406.26 7907.66 1742.41 6756.19 40084.01 33430 1135.5   3.9
5  ALABAMA 1974 16762.67 8025.52 1734.85 7002.29 42057.31 33749 1169.8   5.5
6  ALABAMA 1975 17316.26 8158.23 1752.27 7405.76 43971.71 33604 1155.4   7.7
7  ALABAMA 1976 17732.86 8228.19 1799.74 7704.93 50221.57 35764 1207.0   6.8
8  ALABAMA 1977 18111.93 8365.67 1845.11 7901.15 51084.99 37463 1269.2   7.4
9  ALABAMA 1978 18479.74 8510.64 1960.51 8008.59 52604.05 39964 1336.5   6.3
10 ALABAMA 1979 18881.49 8640.61 2081.91 8158.97 54525.86 40979 1362.0   7.1

state（州）、year(年）、gsp (州内総生産）、pcap (民間資本ストック）、pc(公的資本）、emp (労働投入量）
※参考：Rパッケージガイドブック蓮見亮さん記事。

Rでの実装例

library("plm")

# Producデータ
data("Produc", package = "plm")
data <- Produc

# index に用いる グループ列名、時間列名
grpCName <- "state"
timeCName <- "year"

# pdata.frame (panel data frame) 作成。グループ列, 時間列で index 作成　 
idx <- c(grpCName, timeCName)
pdata <- pdata.frame(data, index = idx, drop.index = TRUE, row.names = TRUE)

# 目的変数列 指定
tidx <- 6

###################################
# Panel Linear Model 作成
###################################
# plm(formula, data, subset, na.action, effect = c("individual","time","twoways"),
#    model = c("within","random","ht","between","pooling","fd"),
#    random.method = c("swar","walhus","amemiya","nerlove"),
#    inst.method = c("bvk","baltagi"), index = NULL, ...)
#
#[model 指定]：
# within: Fixed Effect Model [Default] 
#   (固定効果モデル。個別効果を完全除去。
#    目的変数、説明変数、残差のそれぞれをグループごとの平均を回帰式から引き算)
# pooling: Pooling Model     (グループ、時間を気にせず回帰）
# fd: First-Difference Model (グループごとに回帰。変数の時間差分を用いる）
# between: Between Model　　　　（グループごとに回帰。変数の時間平均を用いる）
# random: error components model
#   (変量効果モデル。残差をグループ間のものとグループ内のものに分割）。
#    ramdom.method 変動性の指定ができ swar [Default], walhus, amemiya, nerlove を指定可能。
#[effect 指定]:
# effect = "time" : 上記処理をグループ毎の処理に置き換える (グループと時間を入れ替え）

# Formula作成：目的変数以外を説明変数とする
cn <- colnames(pdata)
fmla <- as.formula(paste(cn[tidx], " ~ ", paste(cn[-tidx], collapse = " + ")))

# 固定効果モデル
res.fe <-  plm(formula = fmla, data = pdata, model = "within")
#(res.fe <-  plm(formula = fmla, data = data, index = idx, model = "within"))　と同じ
summary(res.fe)

# 変動効果モデル
res.re <-  plm(formula = fmla, data = pdata, model = "random")
summary(res.re)

###################################
# 各種検定
###################################

# 各種豊富な検定関数が用意されている。

library(car)
# Tests for individual and time effect
plmtest(fmla, data = pdata, effect = "twoways", type = "ghm")
# Tests of poolability
pooltest(fmla, data = pdata, model = "within")
# F tests of e^Kffects based on the comparison of the within and the pooling models
pFtest(fmla, data = pdata, effect = "twoways")
# Hausman test based on the comparison of two sets of estimates
phtest(res.fe, res.re)
# Tests of serial correlatio
pwtest(fmla, data = pdata)
# Locally robust tests for serial correlation or random effects
pbsytest(fmla, data = pdata, test = "j")
#Conditional LM test for AR(1) or MA(1) errors under random effect
pbltest(fmla, data = data, alternative = "onesided") #pdata.frame ではなく元 data.frame を用いる

library(lmtest)
#General serial correlation test
pbgtest(res.fe, order = 2)
#Wooldridge's test for serial correlation in short FE panels
pwartest(fmla, data = pdata)
#Wooldridge's first-difference-based test
pwfdtest(fmla, data = pdata)

# 他にも Cross-sectional dependence の検定も用意されている。

実行結果

残差、回帰係数、R^2、等、各種回帰結果が得られている。

> # 固定効果モデル
> res.fe <-  plm(formula = fmla, data = pdata, model = "within")
> summary(res.fe)
Oneway (individual) effect Within Model

Call:
plm(formula = fmla, data = pdata, model = "within")

Balanced Panel: n=48, T=17, N=816

Residuals :
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-1.54e+04 -7.66e+02 -7.04e+01  1.64e-09  7.71e+02  1.68e+04 

Coefficients :
         Estimate  Std. Error t-value  Pr(>|t|)    
pcap   1.1832e+04  1.5029e+04  0.7873   0.43136    
hwy   -1.1833e+04  1.5029e+04 -0.7873   0.43134    
water -1.1831e+04  1.5029e+04 -0.7872   0.43141    
util  -1.1832e+04  1.5029e+04 -0.7873   0.43134    
pc     1.2001e-01  2.1647e-02  5.5437 4.086e-08 ***
emp    3.4932e+01  8.8757e-01 39.3571 < 2.2e-16 ***
unemp -1.6813e+02  6.0568e+01 -2.7759   0.00564 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Total Sum of Squares:    1.3289e+11
Residual Sum of Squares: 4.978e+09
R-Squared      :  0.96254 
      Adj. R-Squared :  0.89766 
F-statistic: 2793.44 on 7 and 761 DF, p-value: < 2.22e-16

> # 変動効果モデル
> res.re <-  plm(formula = fmla, data = pdata, model = "random")
summary(res.re)
> Oneway (individual) effect Random Effect Model 
   (Swamy-Arora's transformation)

Call:
plm(formula = fmla, data = pdata, model = "random")

Balanced Panel: n=48, T=17, N=816

Effects:
                   var  std.dev share
idiosyncratic  6541433     2558 0.185
individual    28839985     5370 0.815
theta:  0.8853  

Residuals :
    Min.  1st Qu.   Median  3rd Qu.     Max. 
-11500.0   -847.0     61.7    849.0  19800.0 

Coefficients :
               Estimate  Std. Error t-value  Pr(>|t|)    
(Intercept) -1.7738e+03  1.1556e+03 -1.5350    0.1252    
pcap         1.4200e+04  1.5615e+04  0.9094    0.3634    
hwy         -1.4201e+04  1.5615e+04 -0.9094    0.3634    
water       -1.4199e+04  1.5615e+04 -0.9093    0.3634    
util        -1.4201e+04  1.5615e+04 -0.9094    0.3634    
pc           1.3118e-01  1.7636e-02  7.4380 2.613e-13 ***
emp          3.4150e+01  7.6819e-01 44.4552 < 2.2e-16 ***
unemp       -2.7338e+02  5.9394e+01 -4.6028 4.840e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Total Sum of Squares:    1.8368e+11
Residual Sum of Squares: 5743500000
R-Squared      :  0.96873 
      Adj. R-Squared :  0.95923 
F-statistic: 3576.08 on 7 and 808 DF, p-value: < 2.22e-16

検定結果例

> # Tests for individual and time effect
> plmtest(fmla, data = pdata, effect = "twoways", type = "ghm")

	Lagrange Multiplier Test - two-ways effects (Gourieroux, Holly and
	Monfort)

data:  fmla 
chisq = 3156.201, df = 2, p-value < 2.2e-16
alternative hypothesis: significant effects

参考資料：時系列分析入門資料

Tokyo.R で講師をした時系列分析の入門資料です。[Ｒ][データマイニング] R言語による時系列分析

[Ｒ勉強会][データマイニング] R言語による時系列分析

View more presentations from Koichi Hamada.

参考文献

Rパッケージガイドブック

作者: 岡田昌史,荒木孝治,伊藤康広,里洋平,高柳慎一,棚瀬貴紀,谷村晋,中谷朋昭,蓮見亮,林真広,樋口千洋,福島真太朗,牧山文彦,横山貴央,akira,mickey24
出版社/メーカー: 東京図書
発売日: 2011/04/09
メディア: 単行本
購入: 2人クリック: 1,298回
この商品を含むブログ (16件) を見る