* Declare the data as time-series using the variable "year"
tsset year

* Estimate an OLS regression of y on x1 and x2
reg y x1 x2

* Compute the Durbin–Watson statistic for autocorrelation
estat dwatson

* Breusch–Godfrey test for autocorrelation up to lag 2 (large-sample version)
estat bgodfrey, lags(2)

* Obtain residuals from the regression and save them as "uhat"
predict uhat, resid

* Perform the Ljung–Box Q test for autocorrelation up to lag 2
wntestq uhat, lags(2)

* Display the autocorrelation and partial autocorrelation of the residuals up to lag 2 
* (this output also includes Ljung–Box Q statistics for each lag)
corrgram uhat, lags(2)

* Estimating a linear regression model when the errors exhibit AR(1) serial correlation using the Cochrane–Orcutt (corc) transformation. The disturbances are assumed to follow a first-order autoregressive process.
prais y x1 x2, corc twostep 
* twostep: stop after the first iteration

* Estimating a linear regression model when the errors exhibit AR(1) serial correlation using the Prais-Winsten transformation. The disturbances are assumed to follow a first-order autoregressive process.
prais y x1 x2, twostep 
* twostep: stop after the first iteration
**************************************************************************************************************
*-----------------------------------------------------------------------
* A) DURBIN–WATSON TEST FOR FIRST-ORDER AUTOCORRELATION
*-----------------------------------------------------------------------

* Estimate the main OLS regression model: Y_t = β0 + β1 X1_t + β2 X2_t + u_t
reg y x1 x2

* Compute the Durbin–Watson statistic for testing first-order autocorrelation:
*   H0: ρ = 0 (no autocorrelation)
*   H1: ρ ≠ 0 (autocorrelation exists)

*-----------------------------------------------------------------------
* Comparison: Correlation between uhat and uhat_lag1 vs. Estimated ρ_hat from the residual regression uhat_t = ρ_hat * uhat_(t-1) + v_t (no constant term)
*-----------------------------------------------------------------------

* Obtain residuals from the regression and save them as "uhat"
predict uhat, resid

* Generate lagged residuals (uhat_t-1)
gen uhat_lag1 = L.uhat

* Compute the sample correlation between uhat and uhat_lag1
corr uhat uhat_lag1

* This correlation provides an approximate estimate of ρ_hat obtained from the regression: uhat_t = ρ_hat * uhat_(t-1) + v_t   (no constant term)
reg uhat uhat_lag1, noconstant

* The estimated coefficient _b[uhat_lag1] should closely match the sample correlation reported above.
*-----------------------------------------------------------------------
* Compare the two methods of calculating Durbin–Watson statistic
*-----------------------------------------------------------------------

* METHOD 1: Direct Durbin–Watson test (built-in)
*-----------------------------------------------------------------------
* d = Σ(uhat_t - uhat_(t-1))^2   /   Σ(uhat_t^2)
*-----------------------------------------------------------------------

* Compute squared differences (uhat_t − uhat_t-1)^2
gen diff_sq = (uhat - uhat_lag1)^2
replace diff_sq = . if missing(uhat_lag1)

* Compute squared residuals uhat_t^2
gen uhat_sq = uhat^2

* Sum numerator: Σ(diff_sq)
quietly summarize diff_sq
scalar num = r(sum)

* Sum denominator: Σ(uhat_sq)
quietly summarize uhat_sq
scalar den = r(sum)

* Compute Durbin–Watson statistic
scalar d_manual = num / den

display "--------------------------------------------------------------"
display " Manual Durbin–Watson statistic (computed from residuals):"
display " d = " %9.6f d_manual
display "--------------------------------------------------------------"

* METHOD 2: Using the correlation between residuals and their lag
* Regression of residuals on their lag (uhat_t = ρ_hat * uhat_(t-1) + v_t)
reg uhat uhat_lag1, noconstant
* Compute the approximate DW statistic using d ≈ 2*(1 - ρ_hat)
display "Approximate DW from residual-lag correlation: " 2*(1 - _b[uhat_lag1])

* Compare the DW statistic from Method 1 with the approximate DW from Method 2
* They should be very close, demonstrating equivalence of the two approaches
**************************************************************************************************************
*-----------------------------------------------------------------------
* B.1) BREUSCH–GODFREY TEST FOR AR(p) AUTOCORRELATION (manual calculation)
*-----------------------------------------------------------------------

* STEP 1: Estimate the main OLS regression model: Y_t = β0 + β1 X1_t + β2 X2_t + u_t
reg y x1 x2

* STEP 2: Obtain residuals from the regression and save them as "uhat"
predict uhat, resid

* STEP 3: Generate lagged residuals up to lag p = 2 (example)
gen uhat_lag1 = L.uhat
gen uhat_lag2 = L2.uhat

* STEP 4: Auxiliary regression of residuals on original regressors + lagged residuals
*   uhat_t = α0 + α1 X1_t + α2 X2_t + ρ1 uhat_(t-1) + ρ2 uhat_(t-2) + v_t
reg uhat x1 x2 uhat_lag1 uhat_lag2

* STEP 5: Record R-squared from the auxiliary regression
display "R-squared from auxiliary regression: " e(r2)

*-----------------------------------------------------------------------
* STEP 6: Compute LM statistic manually
*   LM = (T - p) * R^2_aux
*   Under H0, LM ~ χ² with p degrees of freedom
*-----------------------------------------------------------------------
local p = 2
local T = e(N)
local LM_stat = (`T' - `p') * e(r2)
display "LM statistic: " `LM_stat'

* STEP 7: Compute p-value for LM statistic
*   p-value = P(χ²_p > LM_stat)
display "p-value for LM test: " chi2tail(`p', `LM_stat')

*-----------------------------------------------------------------------
* STEP 8: Compute F statistic manually
*   F = (R^2_aux / p) / ((1 - R^2_aux) / (T - k - 2p - 1))
*   Under H0, F ~ F(p, T - k - 2p - 1)
*-----------------------------------------------------------------------
local k = 2
local F_stat = (e(r2)/`p') / ((1 - e(r2))/(`T' - `k' - 2*`p' - 1))
display "F statistic: " `F_stat'

* STEP 9: Compute p-value for F statistic
display "p-value for F test: " Ftail(`p', `T' - `k' - 2*`p' - 1, `F_stat')

*-----------------------------------------------------------------------
* Notes:
* - This manual procedure replicates the Breusch–Godfrey test.
* - LM statistic follows chi-square distribution with p degrees of freedom.
* - F statistic follows F-distribution with p and T-k-2p-1 degrees of freedom.
* - Reject H0 if p-value < significance level α, indicating evidence of autocorrelation.
*-----------------------------------------------------------------------
**************************************************************************************************************
*-----------------------------------------------------------------------
* B.2) BREUSCH–GODFREY TEST FOR AR(p) AUTOCORRELATION 
*     (manual calculation WITH ZERO-FILLED LAGGED RESIDUALS)
*-----------------------------------------------------------------------

* STEP 1: Estimate the main OLS regression model
*   Y_t = β0 + β1 X1_t + β2 X2_t + u_t
reg y x1 x2

*-----------------------------------------------------------------------
* STEP 2: Obtain residuals from the regression and save them as "uhat"
*-----------------------------------------------------------------------
predict uhat, resid

*-----------------------------------------------------------------------
* STEP 3: Generate lagged residuals up to lag p = 2 
*         Replace missing lag values with ZERO (no observations dropped)
*-----------------------------------------------------------------------
local p = 2

gen uhat_lag1 = L.uhat
replace uhat_lag1 = 0 if missing(uhat_lag1)

gen uhat_lag2 = L2.uhat
replace uhat_lag2 = 0 if missing(uhat_lag2)

*-----------------------------------------------------------------------
* STEP 4: Auxiliary regression:
*   uhat_t = α0 + α1 X1_t + α2 X2_t + ρ1 uhat_(t-1) + ρ2 uhat_(t-2) + v_t
*-----------------------------------------------------------------------
reg uhat x1 x2 uhat_lag1 uhat_lag2

*-----------------------------------------------------------------------
* STEP 5: Extract R-squared from the auxiliary regression
*-----------------------------------------------------------------------
display "R-squared from auxiliary regression: " e(r2)

*-----------------------------------------------------------------------
* STEP 6: Compute LM statistic (zero-filled version)
*   LM = T * R^2_aux       (not T - p)
*   LM ~ χ²(p)
*-----------------------------------------------------------------------
local T = e(N)
local LM_stat = `T' * e(r2)

display "LM statistic (zero-filled version): " `LM_stat'
display "LM p-value: " chi2tail(`p', `LM_stat')

*-----------------------------------------------------------------------
* STEP 7: Compute F statistic (zero-filled version)
*   F = (R^2_aux / p) / ((1 − R^2_aux) / (T − k − p − 1))
*   F ~ F(p, T − k − p − 1)
*-----------------------------------------------------------------------
local k = 2 

local df_denom = `T' - `k' - `p' - 1
local F_stat = (e(r2)/`p') / ((1 - e(r2)) / `df_denom')

display "F statistic (zero-filled version): " `F_stat'
display "F-test p-value: " Ftail(`p', `df_denom', `F_stat')

*-----------------------------------------------------------------------
* Notes:
* - This version uses zero-filling instead of dropping the first p observations.
* - Therefore:
*       LM = T * R^2_aux
*       F = (R^2_aux / p) / ((1 - R^2_aux) / (T - k - p - 1))
* - LM ~ chi-square(p), F ~ F(p, T-k-p-1)
* - Reject H0 if p-value < α → evidence of AR(p) autocorrelation.
*-----------------------------------------------------------------------
**************************************************************************************************************
*-----------------------------------------------------------------------*
* C) LJUNG–BOX TEST FOR AUTOCORRELATION (manual calculation, p = 2)
*-----------------------------------------------------------------------*

* Step 1: Run the OLS regression
* Regress y on x1 and x2
regress y x1 x2

* Step 2: Obtain residuals
* Store residuals in variable 'uhat'
predict uhat, resid

* Step 3: Compute denominator for autocorrelations
* Denominator = sum of squared residuals
gen uhat2 = uhat^2
quietly summarize uhat2 if !missing(uhat2)
local denom = r(sum)

* Step 4: Generate lagged residuals (lag 1 and 2)
gen uhat_lag1 = L1.uhat
gen uhat_lag2 = L2.uhat

* Step 5: Compute numerators for autocorrelations
gen num1 = uhat * uhat_lag1
quietly summarize num1 if !missing(num1)
local rho1 = r(sum)/`denom'

gen num2 = uhat * uhat_lag2
quietly summarize num2 if !missing(num2)
local rho2 = r(sum)/`denom'

* Step 6: Calculate Ljung–Box statistic (order 2)
local T = _N
scalar LB = `T'*(`T'+2)*((`rho1'*`rho1')/(`T'-1) + (`rho2'*`rho2')/(`T'-2))
display "Ljung–Box statistic (order 2): " %9.6f LB

* Step 7: Calculate p-value using correct upper-tail function
scalar pvalue = chi2tail(2, LB)
display "p-value: " %9.6f pvalue

* Step 8: Decision
* If p-value <= 0.05 -> reject H0 (autocorrelation exists)
* If p-value > 0.05 -> fail to reject H0 (no autocorrelation)
**************************************************************************************************************
*-----------------------------------------------------------------------
* D.1. COCHRANE–ORCUTT TRANSFORMATION (manual implementation)
*-----------------------------------------------------------------------

* STEP 1: Estimate the original OLS model
*   Y_t = β0 + β1 X1_t + β2 X2_t + ... + βk Xk_t + u_t
reg y x1 x2

*-----------------------------------------------------------------------
* STEP 2: Obtain OLS residuals u_hat
*-----------------------------------------------------------------------
predict uhat, resid

*-----------------------------------------------------------------------
* STEP 3: Regress residuals on their first lag to estimate ρ
*   u_hat_t = ρ * u_hat_(t-1) + v_t
*-----------------------------------------------------------------------
gen uhat_lag = L.uhat
reg uhat uhat_lag, noconstant

* Store the estimated autocorrelation coefficient
scalar rho = _b[uhat_lag]
display "Estimated rho (Cochrane–Orcutt): " rho

*-----------------------------------------------------------------------
* STEP 4: Generate transformed dependent variable
*   Y_t* = Y_t – ρ_hat * Y_(t-1)
*-----------------------------------------------------------------------
gen y_lag = L.y
gen y_star = y - rho * y_lag

*-----------------------------------------------------------------------
* STEP 5: Generate transformed regressors
*   X_it* = X_it – ρ_hat * X_i,(t-1)
*-----------------------------------------------------------------------
gen x1_lag = L.x1
gen x1_star = x1 - rho * x1_lag

gen x2_lag = L.x2
gen x2_star = x2 - rho * x2_lag

* (Add more regressors similarly if k > 2)

gen x0_star = 1 - rho

*-----------------------------------------------------------------------
* STEP 6: Estimate the transformed model:
*   y*_t = β0 x0*_t + β1 x1*_t + β2 x2*_t + ... + v_t
*-----------------------------------------------------------------------
reg y_star x0_star x1_star x2_star, noconstant

* Store estimated coefficients
scalar b0_hat = _b[x0_star]
scalar b1_hat  = _b[x1_star]
scalar b2_hat  = _b[x2_star]

*-----------------------------------------------------------------------
display "Estimated beta_0 (original intercept): " b0_hat
display "Estimated beta_1: " b1_hat
display "Estimated beta_2: " b2_hat

*-----------------------------------------------------------------------
* Interpretation Notes:
* - Cochrane–Orcutt corrects for AR(1) autocorrelation.
* - If desired, iterations can be applied, but the above code implements the pure one-step Cochrane–Orcutt procedure.
**************************************************************************************************************
*-----------------------------------------------------------------------
* D.2. PRAIS–WINSTEN TRANSFORMATION (manual implementation)
*-----------------------------------------------------------------------

*-----------------------------------------------------------------------
* STEP 1: Estimate the original OLS model
*-----------------------------------------------------------------------
reg y x1 x2

*-----------------------------------------------------------------------
* STEP 2: Obtain OLS residuals
*-----------------------------------------------------------------------
predict uhat, resid

*-----------------------------------------------------------------------
* STEP 3: Estimate rho by regressing residuals on lagged residuals
*-----------------------------------------------------------------------
gen uhat_lag = L.uhat
reg uhat uhat_lag, noconstant

scalar rho = _b[uhat_lag]
display "Estimated rho (Prais–Winsten): " rho

*-----------------------------------------------------------------------
* STEP 4: Construct the transformation multiplier for t=1
*-----------------------------------------------------------------------
scalar pw = sqrt(1 - rho^2)

*-----------------------------------------------------------------------
* STEP 5: Generate lags
*-----------------------------------------------------------------------
gen y_lag  = L.y
gen x1_lag = L.x1
gen x2_lag = L.x2
* (Add further regressors similarly if needed)

*-----------------------------------------------------------------------
* STEP 6: Generate transformed dependent variable Y*
*-----------------------------------------------------------------------
gen y_star = .
replace y_star = pw * y if _n == 1
replace y_star = y - rho * y_lag if _n > 1

*-----------------------------------------------------------------------
* STEP 7: Generate transformed regressors X*
*-----------------------------------------------------------------------
gen x1_star = .
replace x1_star = pw * x1 if _n == 1
replace x1_star = x1 - rho * x1_lag if _n > 1

gen x2_star = .
replace x2_star = pw * x2 if _n == 1
replace x2_star = x2 - rho * x2_lag if _n > 1

* (Add more regressors as needed)

*-----------------------------------------------------------------------
* STEP 8: Transformed intercept regressor X0*
*-----------------------------------------------------------------------
gen x0_star = .
replace x0_star = pw     if _n == 1
replace x0_star = 1-rho  if _n > 1

*-----------------------------------------------------------------------
* STEP 9: Estimate the Prais–Winsten transformed model (with no constant)
*-----------------------------------------------------------------------
reg y_star x0_star x1_star x2_star, noconstant

*-----------------------------------------------------------------------
display "Estimated beta_0: " _b[x0_star]
display "Estimated beta_1: " _b[x1_star]
display "Estimated beta_2: " _b[x2_star]

*-----------------------------------------------------------------------
* This implements the single-iteration Prais–Winsten estimator.
* Iterative Prais–Winsten can be programmed but is typically handled automatically by Stata's built-in command.
*-----------------------------------------------------------------------