Manual Calculation of SEs from Optimization - Pt.2
Notes
Optimx
Standard Errors
Robust
Gradient
Sandwich
Clustered
Random Weights
SEs after Optimx
Author
C. Luke Watson
Published
September 12, 2024
This is a sequel to my last post. Turns out I had a ‘think-o’: “if not independent, then we could use the fully dense [variance-covariance matrix].” This was in the context of robust standard errors using diag(resid^2) as the error variance-covariance matrix, \(\Omega\), within the ‘meat’ of the ‘sandwich.’ Let’s say I was …uhh… thinking how in clustered standard errors we allow for \(\Omega\) to have a block-diagonal structure, where each block is a conditional variance-covariance sub-matrix, \(\Omega_g\), is fully dense … like me!
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 615747 32.9 1253420 67 NA 967890 51.7
Vcells 1112885 8.5 8388608 64 102400 1839961 14.1
# librarieslibrary(tidyverse) # general data manipulation
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom) # model results to data.frame for compact viewinglibrary(optimx) # optim + extraslibrary(sandwich) # Adjust standard errorslibrary(lmtest) # Recalculate model errors with sandwich functions with coeftest()
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
# Set random seed for reproducibilityset.seed(101)
Create Data
I am going to create data where clustering should matter.
# obs - high so that coeficients are well estimatedn <-10000# clustersnum.clusters <-50grps <-sort(floor(runif(n)*num.clusters)+1)# cluster effectscluster.y.effect <-rnorm(num.clusters,0,2)cluster.u.effect <-rnorm(num.clusters,0,2)cluster.x1.effect <-rnorm(num.clusters,0,0.2)cluster.x2.effect <-rnorm(num.clusters,0,0.2)cluster.x1.x2.effect <-rnorm(num.clusters,0,0.2)# two x vars (will add a constant)x1=rnorm(n,1,1+ cluster.x1.effect/3)x2=rnorm(n,1,2)# clustered errorssd <-runif(n, min=0.5, max=4)s.true <- sd*(x1/5+1) # need to ensure that all s's are positiveerror <-rnorm(n,0,s.true) + cluster.u.effect[grps] + x1*cluster.x1.effect[grps] + x2*cluster.x2.effect[grps] + x1*x2*cluster.x1.x2.effect[grps]error <- error -mean(error)# true coefficentscoef.true <-c(1,-4,2)#Generate the dependent variabley1 <- coef.true[1] + (coef.true[2]*x1) + (coef.true[3]*x2) + error
Canned SEs
Here, I’ll quickly show how to get non-robust, robust, and clustered SEs. Note, this approach uses lmtest and sandwich. If I had a lot of FEs, then I would likely use feols package.
# Model, non robustm1.nonrobust <-lm(y1 ~ x1 + x2)# Robust standard errorsm1.robust <-coeftest(m1.nonrobust,vcov = vcovHC)# clusteredm1.clu <-coeftest(m1.nonrobust, vcov = vcovCL, cluster =~grps)# Function to extract standard errorsget_se <-function(model) {if (class(model)[1] =="lm") {return(sqrt(diag(vcov(model)))) } else {return(model[, "Std. Error"]) }}# Create a data frame with all standard errorsse_comparison <-data.frame( Coef = m1.nonrobust$coefficients,Non_Robust =get_se(m1.nonrobust),Robust =get_se(m1.robust),Clustered =get_se(m1.clu))# Round all numeric columns to 4 decimal placesse_comparison <- se_comparison %>%mutate(across(where(is.numeric), ~round(., 4)))# Display the tablese_comparison
As before, my goal is to write the functions as generally as possible. I do this by first writing a residual function, then a SSR function that calls the residual function.
# We are going to code of our sum of squared residuals function# first, write the residual function f.resid <-function(coef.arg){ b0 <- coef.arg[1] b1 <- coef.arg[2] b2 <- coef.arg[3] u.i <- (y1 - (b0 + b1*x1 + b2*x2))return(u.i)}# second, write the SSR function# note: that we divide by 2! # if we do not, then coefs are true but SEs are wrong!f.ssr <-function(coef.arg){# coef.arg <- coef.init u2 <- (f.resid(coef.arg))^2 ssr <-sum(u2)/2if(is.na(ssr) |is.nan(ssr)) return(Inf)return(ssr)}# initial coef valuescoef.init <-c(1,1,1)# run optimx# note: (1) use BFGS since no constraints, (2) request hessian out.reg <-optimx(coef.init, f.ssr,method =c("BFGS"),control =list(maxit=500000,trace=0),hessian=T)# final coefscoef.fin <-as.numeric(out.reg[1,1:3])coef.fin
[1] 0.9444571 -3.9659487 2.0215917
Manual Non-robust SEs
As before, non-robust SEs are the degrees-of-freedom adjusted residual SD scaled squared root of the inverse hessian.
# calculate residuals at final estimateresid <-f.resid(coef.fin)# get DoF adjusted SD of resid(b.star)s2.hat <-sd(resid)*sqrt(((n-1)/(n-3)))# get hessian from detailshessian <-attributes(out.reg)$details["BFGS", "nhatend"][[1]]# calculate the non-robust SEs se.non.robust <-sqrt(diag(base::solve(hessian)))*s2.hat
Manual Robust SEs
Note, for the robust SEs we need a more intensive estimator. It is called the sandwich estimator. First, we calculate the jacobian, \(\nabla g\), (vector valued gradient) of the residual function, \(u\), which is in n-by-k. The ‘meat’ of the sandwich by calculating product of the score with itself, under the assumption of heteroskedasticity but independent draws. So, second, we calculate \(m = \nabla g^\mathsf{T} \times \mathsf{diag}(u^2)\times \nabla g\). The ‘bread’ is the the inverse of the covariance of the jacobian. So, third, we calculate \(b = \nabla g^\mathsf{T} \nabla g\). Finally, we make a sandwich: \(\mathsf{vcov}_{\mathsf{robust}}(\hat{\beta}) = b\times m\times b\). And that’s it.
# Robust SEs manual need the jacobian (multivalued function gradient) wrt the parameters# we will do this numerically (although it is analytic for OLS)gradient.of.residual <- numDeriv::jacobian(func = f.resid, x = coef.fin)# robust meat assumes independent but not identically distributed# the meat is the cross product of score = gT*u, which is the FOCmeat.robust <-t(gradient.of.residual) %*%diag(resid^2) %*% gradient.of.residual# E[gTg] is the covariance of gbread.robust <-solve(t(gradient.of.residual) %*% gradient.of.residual)# Calculate the sandwichsandwich.robust <- bread.robust %*% meat.robust %*% bread.robust# Calculate robust standard errorsse.robust <-sqrt(diag(sandwich.robust))
Manual Cluster Robust SEs
The cluster robust SEs use the same sandwich principal as the robust; however, we allow the variance-covariance matrix of the errors, \(\Omega\), to have a block diagonal structure such that each cluster-group has a dense conditional variance-covariance matrix \(\Omega_{c}\). Calculating the block diagonal matrix could be a little complicated, but it turns out that, assuming \(\Omega\) has the block diagonal structure, \(g^{\mathsf{T}}\Omega g\) is \(k\times k\) and can be calculated by \(\sum_{c\in C} \nabla g_{c}^{\mathsf{T}}\Omega_{c} \nabla g_{c}\) (which are also \(k\times k\).
# Cluster Robust estimate of V[\hat{\bbeta}]# in Robust, just use diag# for cluster, theory says use block diagonal# however, that is a big matrix (tho we do not need to invert)# rather, can sum cluster group specific score matrices# bread is the same# bread.robust <- solve(t(gradient.of.residual) %*% gradient.of.residual)# also has an adjustment factor since summing over clustersmeat.cl <-matrix(0, nrow =ncol(gradient.of.residual), ncol =ncol(gradient.of.residual))for (g in1:num.clusters) {# g <- 1 meat.cl <- meat.cl + ( (t(gradient.of.residual[grps == g, ]) %*% resid[grps == g]) %*% (t(resid[grps == g]) %*% gradient.of.residual[grps == g, ]) )}#adj.factor <- ((num.clusters/(num.clusters-1)) * ((n -1)/(n -3)))sandwich.clu <- (adj.factor * (bread.robust %*% (meat.cl %*% bread.robust)))se.clustered <-sqrt(diag(sandwich.clu))# Create a data frame with all standard errorsse_comparison$Non_Robust.2<- se.non.robustse_comparison$Robust.2<- se.robustse_comparison$Clustered.2<- se.clustered# Round all numeric columns to 4 decimal placesse_comparison <- se_comparison %>%mutate(across(where(is.numeric), ~round(., 4)))# Display the tablese_comparison
# arbitrary weights for now to highlight differencew <- ((1:n/n)-0.5)^2+0.001# Modelm1.nonrobust.w <-lm(y1 ~ x1 + x2, weights = w)# Robust standard errors m1.robust.w <-coeftest(m1.nonrobust.w,vcov = vcovHC)m1.clu.w <-coeftest(m1.nonrobust.w, vcov = vcovCL, cluster =~grps, type ="HC1")# Create a data frame with all standard errorsse.comparison.w <-data.frame( Coef = m1.nonrobust.w$coefficients,Non_Robust =get_se(m1.nonrobust.w),Robust =get_se(m1.robust.w),Clustered =get_se(m1.clu.w))# Round all numeric columns to 4 decimal placesse.comparison.w <- se.comparison.w %>%mutate(across(where(is.numeric), ~round(., 4)))# Display the tablese.comparison.w
Adding weights to our estimation set up is trivial for coefficients.
f.ssr.w <-function(coef.arg){# coef.arg <- coef.init u2 <- w*((f.resid(coef.arg))^2) ssr <-sum(u2)/2if(is.na(ssr) |is.nan(ssr)) return(Inf)return(ssr)}# initial coef valuescoef.init <-c(1,1,1)# run optimx# note: (1) use BFGS since no constraints, (2) request hessian out.reg.w <-optimx(coef.init, f.ssr.w,method =c("BFGS"),control =list(maxit=500000,trace=0),hessian=T)# final coefscoef.fin.w <-as.numeric(out.reg.w[1,1:3])coef.fin.w
[1] 1.231613 -3.887036 2.029959
Non-Robust SEs of WLS
Non-robust weighed SEs only require adjusting the scaling factor for the weights.
# the residual function does not change but the estimates do!resid.w <-f.resid(coef.fin.w)# new DoF adjusted SD of resid(b.star)s2.hat.w <-sd(resid.w)*sqrt(((n-1)/(n-3)))# get hessian from detailshessian.w <-attributes(out.reg)$details["BFGS", "nhatend"][[1]]# calculate the non-robust SEs se.non.robust.w <-sqrt(diag(base::solve(hessian.w)))*s2.hat.w
Robust SEs of WLS
Robust weighed SEs require a weighting matrix be added to the sandwich formula.
# can use the same gradient function, but new coef! gradient.of.residual.w <- numDeriv::jacobian(func = f.resid, x = coef.fin.w)# weight matrixW <-diag(w)# bread.w <-solve(t(gradient.of.residual.w) %*% W %*% gradient.of.residual.w)meat.w <- (t(gradient.of.residual.w) %*% W %*%diag((resid.w)^2) %*%t(W) %*% gradient.of.residual.w)vcov.w <- bread.w %*% meat.w %*%t(bread.w)se.robust.w <-sqrt(diag(vcov.w))
Robust Clustered SEs of WLS
Cluster robust weighed SEs require a weighting matrix be added to the sandwich formula.
# can use the same gradient# gradient.of.residual.w <- numDeriv::jacobian(func = f.resid, x = coef.fin.w)# weight matrix# W <- diag(w)# # bread.w <- solve(t(gradient.of.residual.w) %*% W %*% gradient.of.residual.w)# meat.cl.w <-matrix(0, nrow =ncol(gradient.of.residual.w), ncol =ncol(gradient.of.residual.w))for (g in1:num.clusters) {# g <- 1 meat.cl.w <- meat.cl.w + ( (t(gradient.of.residual.w[grps == g, ]) %*%diag(w[grps == g]) %*% resid.w[grps == g]) %*% (t(resid.w[grps == g]) %*%t(diag(w[grps == g])) %*% gradient.of.residual.w[grps == g, ]) )}#adj.factor <- ((num.clusters/(num.clusters-1)) * ((n -1)/(n -3)))sandwich.clu.w <- (adj.factor * (bread.w %*% (meat.cl.w %*% bread.w)))se.clustered.w <-sqrt(diag(sandwich.clu.w))# Create a data frame with all standard errorsse.comparison.w$Non_Robust.2<- se.non.robust.wse.comparison.w$Robust.2<- se.robust.wse.comparison.w$Clustered.2<- se.clustered.w# Round all numeric columns to 4 decimal placesse.comparison.w <- se.comparison.w %>%mutate(across(where(is.numeric), ~round(., 4)))# Display the tablese.comparison.w