For a project, I plan on calculating standard errors for a Non-linear Least Squares estimation approach, but before I can do that I need a refresher. After some googling and checking my reference books, I was surprised that I could not find a simple yet comprehensive bit of example code for a non-ML, NLS approach. For below, I consulted two sets of slides from A. Colin Cameron (slides1, slides2), as well as F.Hayashi’s book. I also used Claude.ai to debug some errors.
Set up
# clear workspacerm(list=ls())gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 615510 32.9 1253208 67 NA 953356 51.0
Vcells 1109196 8.5 8388608 64 102400 1840064 14.1
# librarieslibrary(tidyverse) # general data manipulation
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom) # model results to data.frame for compact viewinglibrary(optimx) # optim + extraslibrary(sandwich) # Adjust standard errorslibrary(lmtest) # Recalculate model errors with sandwich functions with coeftest()
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
# Set random seed for reproducibilityset.seed(1701)
Create Data
I am adding heteroskedasticity so that I can see the difference between non-robust and robust SEs.
# obs - high so that coeficients are well estimatedn <-10000# two x vars (will add a constant)x1=rnorm(n,1,1)x2=rnorm(n,1,2)# heteroskedasticitysd <-runif(n, min=0.5, max=4)s.true <- sd*(x1/5+1) # need to ensure that all s's are positiveerror <-rnorm(n,0,s.true)# true coefficentscoef.true <-c(1,-4,2)#Generate the dependent variabley1 <- coef.true[1] + (coef.true[2]*x1) + (coef.true[3]*x2) + error
This is really where I need the refresher. I found that most treatments wrote up the theory using general functions, but gave the examples with application specific functional forms. To me, this obscured more than helped. I find that the best thing to do is to directly code the theory to really get what is going on and get the most generalizable (application agnostic) code… plus I think it helps one understand the theory more.
My goal is to replicate the standard errors using as general coding as possible. I think the key to this is by specifying a residual function directly and calling that within the SSR function. Even though I am doing OLS, I never actually use any functional form related to OLS outside of the residual function.
# We are going to code of our sum of squared residuals function# first, write the residual function f.resid <-function(coef.arg){ b0 <- coef.arg[1] b1 <- coef.arg[2] b2 <- coef.arg[3] u.i <- (y1 - (b0 + b1*x1 + b2*x2))return(u.i)}# second, write the SSR function# note: that we divide by 2! # if we do not, then coefs are true but SEs are wrong!f.ssr <-function(coef.arg){# coef.arg <- coef.init u2 <- (f.resid(coef.arg))^2 ssr <-sum(u2)/2if(is.na(ssr) |is.nan(ssr)) return(Inf)return(ssr)}# initial coef valuescoef.init <-c(1,1,1)# run optimx# note: (1) use BFGS since no constraints, (2) request hessian out.reg <-optimx(coef.init, f.ssr,method =c("BFGS"),control =list(maxit=500000,trace=1),hessian=T)
fn is fn1
Looking for method = BFGS
Function has 3 arguments
Analytic gradient not made available.
Analytic Hessian not made available.
Scale check -- log parameter ratio= 0 log bounds ratio= NA
Method: BFGS
initial value 268911.921901
final value 45464.216631
converged
Post processing for method BFGS
Successful convergence!
Compute Hessian approximation at finish of BFGS
Compute gradient approximation at finish of BFGS
Save results from method BFGS
$par
[1] 1.007687 -3.977331 1.991223
$value
[1] 45464.22
$message
NULL
$convcode
[1] 0
$fevals
function
31
$gevals
gradient
6
$nitns
[1] NA
$kkt1
[1] TRUE
$kkt2
[1] TRUE
$xtimes
user.self
0.022
Assemble the answers
# final coefscoef.fin <-as.numeric(out.reg[1,1:3])coef.fin
[1] 1.007687 -3.977331 1.991223
Non-robust SEs
Non-robust SEs are the degrees-of-freedom adjusted residual SD scaled squared root of the inverse hessian.
# calculate residuals at final estimateresid <-f.resid(coef.fin)# get DoF adjusted SD of resid(b.star)s2.hat <-sd(resid)*sqrt(((n-1)/(n-3)))# get hessian from detailshessian <-attributes(out.reg)$details["BFGS", "nhatend"][[1]]# calculate the non-robust SEs se.hessian <-sqrt(diag(base::solve(hessian)))*s2.hat# compare SEs from canned and manualtidy(m1)
Note, for the robust SEs we need a more intensive estimator. First, we needed the jacobian (vector valued gradient) of the residual function, which is in n-by-k. Second, we need to form the meat using that jacobian and the covariance matrix of the residuals. Third, we need to get the inverse of the covariance of the jacobian. Finally, we make a sandwich.
And that’s it.
# Robust SEs manual need the jacobian (multivalued function gradient) wrt the parameters# we will do this numerically (although it is analytic for OLS)resid.grad <- numDeriv::jacobian(func = f.resid, x = coef.fin)# Robust SEs use the 'sandwich' estimator:# BREAD * MEAT * BREAD# MEAT = gT * diag(resid^2) * g# BREAD = [ gT*g ]^{-1}# this is a 'plug-in' estimator where we use our estimates to get # sample analogues of the things we want (similar to avg as sample version of mean)# bc data are independent, the diag version is correct# if not independent, then we could use the fully dense (resid %*% t(resid))meat <-t(resid.grad) %*%diag(resid^2) %*% resid.grad# solve(A) = A^{-1}# E[gTg] is the covariance of gbread <-solve(t(resid.grad) %*% resid.grad)# Calculate the sandwichsandwich <- bread %*% meat %*% bread# Calculate robust standard errorsrobust_se <-sqrt(diag(sandwich))# tidy(m1.robust)