豪斯曼, 拉姆齊檢驗,過分擬合,弱工具和過分識別,模型選擇和重抽樣問題

豪斯曼, 拉姆齊檢驗,過分擬合,弱工具和過分識別,模型選擇和重抽樣問題


1.Hausman specification test瀏覽器

The test evaluates the consistency of an estimator when compared to an alternative, less efficient estimator which is already known to be consistent. It helps one evaluate if a statistical model corresponds to the data.app

用處1:檢測變量的內生性This test can be used to check for the endogeneity of a variable (by comparing instrumental variable (IV) estimates to ordinary least squares (OLS) estimates).less

用處2:檢測增長一個額外工具變量的正當性It can also be used to check the validity of extra instruments by comparing IV estimates using a full set of instruments Z to IV estimates that use a proper subset of Z. Note that in order for the test to work in the latter case, we must be certain of the validity of the subset of Z and that subset must have enough instruments to identify the parameters of the equation.dom

用處3:區分面板數據中的固定效應和隨機效應The Hausman test can be also used to differentiate between fixed effects model and random effects model in panel data. In this case, Random effects (RE) is preferred under the null hypothesis due to higher efficiency, while under the alternative Fixed effects (FE) is at least consistent and thus preferred.ide

2.Ramsey RESET test函數

Specification error occurs when an independent variable is correlated with the error term. There are several different causes of specification error:工具

用處1:檢測是否是用了不正確的方程式An incorrect functional form could be employed;測試

用處2:檢測是否是省略了重要變量a variable omitted from the model may have a relationship with both the dependent variable and one or more of the independent variables (omitted-variable bias);ui

用處3:檢測是否是加入了不相關的變量an irrelevant variable may be included in the model;

用處4:檢測是否是有聯立性偏誤the dependent variable may be part of a system of simultaneous equations (simultaneity bias);

用處5:檢測是否是有測量偏差measurement errors may affect the independent variables.





豪斯曼, 拉姆齊檢驗,過分擬合,弱工具和過分識別,模型選擇和重抽樣問題


  • n>p時,最小二乘迴歸會有較小的方差

  • n=p時,容易產生過擬合(overfitting)

豪斯曼, 拉姆齊檢驗,過分擬合,弱工具和過分識別,模型選擇和重抽樣問題


In order to avoid overfitting, it is necessary to use additional techniques (e.g. cross-validation(交叉驗證), regularization(正則化), early stopping, pruning, Bayesian priors on parameters, model comparison or dropout), that can indicate when further training is not resulting in better generalization. 對於這些過分擬合的補救方法能夠參看:http://dwz.cn/6uAcog(複製到瀏覽器)。

The basis of some techniques is either (1) to explicitly penalize overly complex models, or (2) to test the model's ability to generalize by evaluating its performance on a set of data not used for training, which is assumed to approximate the typical unseen data that a model will encounter.

豪斯曼, 拉姆齊檢驗,過分擬合,弱工具和過分識別,模型選擇和重抽樣問題

4.Weak instruments and overidentification test

4.1.「Weak Instruments」 (弱工具變量會形成迴歸的效率甚至一致性出問題)

• If cov(z, x) is weak, IV no longer has such desirable asymptotic properties

• IV estimates are not unbiased, and the bias tends to be larger when instruments are weak (even with very large datasets)

• Weak instruments tend to bias the results towards the OLS estimates

• Adding more and more instruments to improve asymptotic efficiency does not solve the problem. Recommendation always test the ‘strength’ of your instrument(s) by reporting the F-test on the instruments in the first stage regression (若是第一階段的內生變量X對工具變量Z的迴歸中,F test的數值大於10,就不是weak instruments)。

4.2.Overidentification test(在工具變量多於內生變量狀況下,檢測變量這些工具變量是否是外生的)

豪斯曼, 拉姆齊檢驗,過分擬合,弱工具和過分識別,模型選擇和重抽樣問題

sargan test原假設是全部工具變量外生時構造近似卡方統計量,若是違反原假設,2SLS有偏,隨機干擾項估計也有偏,統計量天然也不服從卡方分佈。若是違反原假設,2SLS有偏,隨機干擾項估計也有偏,統計量天然也不服從卡方分佈。這裏檢驗只考慮原假設下統計量的顯著性問題,若是卡方統計量大則拒絕原假設認爲,工具變量有內生的,反之不能認爲工具變量內生(固然也不能確定外生)。因爲原假設是外生,檢驗不能檢驗是否外生。

5.Criteria for model selection(模型選擇標準)

Akaike information criterion

Bayes factor

Bayesian information criterion


Deviance information criterion

False discovery rate

Focused information criterion

Likelihood-ratio test

Mallows's Cp

Minimum description length (Algorithmic information theory)

Minimum message length (Algorithmic information theory)

Structural Risk Minimization

Stepwise regression

The most commonly used criteria are (i) the Akaike information criterion and (ii) the Bayes factor and/or the Bayesian information criterion (which to some extent approximates the Bayes factor).

6.Bootstrap, Jacknife and Permutation test







Permutation test 置換檢驗(非參數檢驗)


in detials:






