Manual Calculation of SEs from Optimization - Pt.2
Notes
Optimx
Standard Errors
Robust
Gradient
Sandwich
Clustered
Random Weights
Extending the sandwich estimator to clustered and weighted standard errors after numerical optimization.
Author
C. Luke Watson
Published
September 12, 2024
This is a sequel to my last post. Turns out I had a ‘think-o’: “if not independent, then we could use the fully dense [variance-covariance matrix].” This was in the context of robust standard errors using diag(resid^2) as the error variance-covariance matrix, \(\Omega\), within the ‘meat’ of the ‘sandwich.’ Let’s say I was …uhh… thinking how in clustered standard errors we allow for \(\Omega\) to have a block-diagonal structure, where each block is a conditional variance-covariance sub-matrix, \(\Omega_g\), is fully dense … like me!
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 1075512 57.5 1668088 89.1 NA 1668088 89.1
Vcells 1888504 14.5 8388608 64.0 49152 3392227 25.9
# librarieslibrary(tidyverse) # general data manipulation
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom) # model results to data.frame for compact viewinglibrary(optimx) # optim + extraslibrary(sandwich) # Adjust standard errorslibrary(lmtest) # Recalculate model errors with sandwich functions with coeftest()
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
# Set random seed for reproducibilityset.seed(101)
Create Data
I am going to create data where clustering should matter.
# obs - high so that coeficients are well estimatedn <-10000# clustersnum.clusters <-50grps <-sort(floor(runif(n)*num.clusters)+1)# cluster effectscluster.y.effect <-rnorm(num.clusters,0,2)cluster.u.effect <-rnorm(num.clusters,0,2)cluster.x1.effect <-rnorm(num.clusters,0,0.2)cluster.x2.effect <-rnorm(num.clusters,0,0.2)cluster.x1.x2.effect <-rnorm(num.clusters,0,0.2)# two x vars (will add a constant)x1=rnorm(n,1,1+ cluster.x1.effect/3)x2=rnorm(n,1,2)# clustered errorssd <-runif(n, min=0.5, max=4)s.true <- sd*(x1/5+1) # need to ensure that all s's are positiveerror <-rnorm(n,0,s.true) + cluster.u.effect[grps] + x1*cluster.x1.effect[grps] + x2*cluster.x2.effect[grps] + x1*x2*cluster.x1.x2.effect[grps]error <- error -mean(error)# true coefficentscoef.true <-c(1,-4,2)#Generate the dependent variabley1 <- coef.true[1] + (coef.true[2]*x1) + (coef.true[3]*x2) + error
Canned SEs
Here, I’ll quickly show how to get non-robust, robust, and clustered SEs. Note, this approach uses lmtest and sandwich. If I had a lot of FEs, then I would likely use feols package.
# Model, non robustm1.nonrobust <-lm(y1 ~ x1 + x2)# Robust standard errorsm1.robust <-coeftest(m1.nonrobust,vcov = vcovHC)# clusteredm1.clu <-coeftest(m1.nonrobust, vcov = vcovCL, cluster =~grps)# Function to extract standard errorsget_se <-function(model) {if (class(model)[1] =="lm") {return(sqrt(diag(vcov(model)))) } else {return(model[, "Std. Error"]) }}# Create a data frame with all standard errorsse_comparison <-data.frame( Coef = m1.nonrobust$coefficients,Non_Robust =get_se(m1.nonrobust),Robust =get_se(m1.robust),Clustered =get_se(m1.clu))# Round all numeric columns to 4 decimal placesse_comparison <- se_comparison %>%mutate(across(where(is.numeric), ~round(., 4)))# Display the tablese_comparison
As before, my goal is to write the functions as generally as possible. I do this by first writing a residual function, then a SSR function that calls the residual function.
# We are going to code of our sum of squared residuals function# first, write the residual function f.resid <-function(coef.arg){ b0 <- coef.arg[1] b1 <- coef.arg[2] b2 <- coef.arg[3] u.i <- (y1 - (b0 + b1*x1 + b2*x2))return(u.i)}# second, write the SSR function# note: that we divide by 2! # if we do not, then coefs are true but SEs are wrong!f.ssr <-function(coef.arg){# coef.arg <- coef.init u2 <- (f.resid(coef.arg))^2 ssr <-sum(u2)/2if(is.na(ssr) |is.nan(ssr)) return(Inf)return(ssr)}# initial coef valuescoef.init <-c(1,1,1)# run optimx# note: (1) use BFGS since no constraints, (2) request hessian out.reg <-optimx(coef.init, f.ssr,method =c("BFGS"),control =list(maxit=500000,trace=0),hessian=T)# final coefscoef.fin <-as.numeric(out.reg[1,1:3])coef.fin
[1] 0.9445 -3.9659 2.0216
Manual Non-robust SEs
As before, non-robust SEs are the degrees-of-freedom adjusted residual SD scaled squared root of the inverse hessian.
# calculate residuals at final estimateresid <-f.resid(coef.fin)# get DoF adjusted SD of resid(b.star)s2.hat <-sd(resid)*sqrt(((n-1)/(n-3)))# get hessian from detailshessian <-attributes(out.reg)$details["BFGS", "nhatend"][[1]]# calculate the non-robust SEs se.non.robust <-sqrt(diag(base::solve(hessian)))*s2.hat
Manual Robust SEs
Note, for the robust SEs we need a more intensive estimator. It is called the sandwich estimator. First, we calculate the jacobian, \(\nabla g\), (vector valued gradient) of the residual function, \(u\), which is in n-by-k. The ‘meat’ of the sandwich by calculating product of the score with itself, under the assumption of heteroskedasticity but independent draws. So, second, we calculate \(m = \nabla g^\mathsf{T} \times \mathsf{diag}(u^2)\times \nabla g\). The ‘bread’ is the the inverse of the covariance of the jacobian. So, third, we calculate \(b = \nabla g^\mathsf{T} \nabla g\). Finally, we make a sandwich: \(\mathsf{vcov}_{\mathsf{robust}}(\hat{\beta}) = b\times m\times b\). And that’s it.
# Robust SEs manual need the jacobian (multivalued function gradient) wrt the parameters# we will do this numerically (although it is analytic for OLS)gradient.of.residual <- numDeriv::jacobian(func = f.resid, x = coef.fin)# robust meat assumes independent but not identically distributed# the meat is the cross product of score = gT*u, which is the FOCmeat.robust <-t(gradient.of.residual) %*%diag(resid^2) %*% gradient.of.residual# E[gTg] is the covariance of gbread.robust <-solve(t(gradient.of.residual) %*% gradient.of.residual)# Calculate the sandwichsandwich.robust <- bread.robust %*% meat.robust %*% bread.robust# Calculate robust standard errorsse.robust <-sqrt(diag(sandwich.robust))
Manual Cluster Robust SEs
The cluster robust SEs use the same sandwich principal as the robust; however, we allow the variance-covariance matrix of the errors, \(\Omega\), to have a block diagonal structure such that each cluster-group has a dense conditional variance-covariance matrix \(\Omega_{c}\). Calculating the block diagonal matrix could be a little complicated, but it turns out that, assuming \(\Omega\) has the block diagonal structure, \(g^{\mathsf{T}}\Omega g\) is \(k\times k\) and can be calculated by \(\sum_{c\in C} \nabla g_{c}^{\mathsf{T}}\Omega_{c} \nabla g_{c}\) (which are also \(k\times k\).
# Cluster Robust estimate of V[\hat{\bbeta}]# in Robust, just use diag# for cluster, theory says use block diagonal# however, that is a big matrix (tho we do not need to invert)# rather, can sum cluster group specific score matrices# bread is the same# bread.robust <- solve(t(gradient.of.residual) %*% gradient.of.residual)# also has an adjustment factor since summing over clustersmeat.cl <-matrix(0, nrow =ncol(gradient.of.residual), ncol =ncol(gradient.of.residual))for (g in1:num.clusters) {# g <- 1 meat.cl <- meat.cl + ( (t(gradient.of.residual[grps == g, ]) %*% resid[grps == g]) %*% (t(resid[grps == g]) %*% gradient.of.residual[grps == g, ]) )}#adj.factor <- ((num.clusters/(num.clusters-1)) * ((n -1)/(n -3)))sandwich.clu <- (adj.factor * (bread.robust %*% (meat.cl %*% bread.robust)))se.clustered <-sqrt(diag(sandwich.clu))# Create a data frame with all standard errorsse_comparison$Non_Robust.2<- se.non.robustse_comparison$Robust.2<- se.robustse_comparison$Clustered.2<- se.clustered# Round all numeric columns to 4 decimal placesse_comparison <- se_comparison %>%mutate(across(where(is.numeric), ~round(., 4)))# Display the tablese_comparison
# arbitrary weights for now to highlight differencew <- ((1:n/n)-0.5)^2+0.001# Modelm1.nonrobust.w <-lm(y1 ~ x1 + x2, weights = w)# Robust standard errors m1.robust.w <-coeftest(m1.nonrobust.w,vcov = vcovHC)m1.clu.w <-coeftest(m1.nonrobust.w, vcov = vcovCL, cluster =~grps, type ="HC1")# Create a data frame with all standard errorsse.comparison.w <-data.frame( Coef = m1.nonrobust.w$coefficients,Non_Robust =get_se(m1.nonrobust.w),Robust =get_se(m1.robust.w),Clustered =get_se(m1.clu.w))# Round all numeric columns to 4 decimal placesse.comparison.w <- se.comparison.w %>%mutate(across(where(is.numeric), ~round(., 4)))# Display the tablese.comparison.w
Adding weights to our estimation set up is trivial for coefficients.
f.ssr.w <-function(coef.arg){# coef.arg <- coef.init u2 <- w*((f.resid(coef.arg))^2) ssr <-sum(u2)/2if(is.na(ssr) |is.nan(ssr)) return(Inf)return(ssr)}# initial coef valuescoef.init <-c(1,1,1)# run optimx# note: (1) use BFGS since no constraints, (2) request hessian out.reg.w <-optimx(coef.init, f.ssr.w,method =c("BFGS"),control =list(maxit=500000,trace=0),hessian=T)# final coefscoef.fin.w <-as.numeric(out.reg.w[1,1:3])coef.fin.w
[1] 1.232 -3.887 2.030
Non-Robust SEs of WLS
Non-robust weighed SEs only require adjusting the scaling factor for the weights.
# the residual function does not change but the estimates do!resid.w <-f.resid(coef.fin.w)# new DoF adjusted SD of resid(b.star)s2.hat.w <-sd(resid.w)*sqrt(((n-1)/(n-3)))# get hessian from detailshessian.w <-attributes(out.reg)$details["BFGS", "nhatend"][[1]]# calculate the non-robust SEs se.non.robust.w <-sqrt(diag(base::solve(hessian.w)))*s2.hat.w
Robust SEs of WLS
Robust weighed SEs require a weighting matrix be added to the sandwich formula.
# can use the same gradient function, but new coef! gradient.of.residual.w <- numDeriv::jacobian(func = f.resid, x = coef.fin.w)# weight matrixW <-diag(w)# bread.w <-solve(t(gradient.of.residual.w) %*% W %*% gradient.of.residual.w)meat.w <- (t(gradient.of.residual.w) %*% W %*%diag((resid.w)^2) %*%t(W) %*% gradient.of.residual.w)vcov.w <- bread.w %*% meat.w %*%t(bread.w)se.robust.w <-sqrt(diag(vcov.w))
Robust Clustered SEs of WLS
Cluster robust weighed SEs require a weighting matrix be added to the sandwich formula.
# can use the same gradient# gradient.of.residual.w <- numDeriv::jacobian(func = f.resid, x = coef.fin.w)# weight matrix# W <- diag(w)# # bread.w <- solve(t(gradient.of.residual.w) %*% W %*% gradient.of.residual.w)# meat.cl.w <-matrix(0, nrow =ncol(gradient.of.residual.w), ncol =ncol(gradient.of.residual.w))for (g in1:num.clusters) {# g <- 1 meat.cl.w <- meat.cl.w + ( (t(gradient.of.residual.w[grps == g, ]) %*%diag(w[grps == g]) %*% resid.w[grps == g]) %*% (t(resid.w[grps == g]) %*%t(diag(w[grps == g])) %*% gradient.of.residual.w[grps == g, ]) )}#adj.factor <- ((num.clusters/(num.clusters-1)) * ((n -1)/(n -3)))sandwich.clu.w <- (adj.factor * (bread.w %*% (meat.cl.w %*% bread.w)))se.clustered.w <-sqrt(diag(sandwich.clu.w))# Create a data frame with all standard errorsse.comparison.w$Non_Robust.2<- se.non.robust.wse.comparison.w$Robust.2<- se.robust.wse.comparison.w$Clustered.2<- se.clustered.w# Round all numeric columns to 4 decimal placesse.comparison.w <- se.comparison.w %>%mutate(across(where(is.numeric), ~round(., 4)))# Display the tablese.comparison.w