Wednesday, June 30, 2010

Multinomial logit in Stata and R III

Another set of translations between Stata and R - calculation of the most important kind of margins (see previous post), i.e., the marginal effects. This requires us to use yet another R package for out-of-the-box calculations of marginal effects, although I do also show how to do this by hand. Note that in both the R constructions, the standard errors are missing, and in a future post I will show how to calculate the standard errors analytically using the delta method and a nonparametric bootstrap.


Here I construct the marginal effects for the covariate "age" on the predicted probabilities of each of the three categories of the outcome. For the automatic construction of marginal effects in R, I make use of another package, VGAM and its vglm() and margeff() functions.
R

########################
# calculate the marginal effects
########################
library(VGAM)
library(gregmisc)
library(foreign)
sysdn <- read.dta("D:/programming/r/sandbox/sysdn1.dta", convert.factors=TRUE)
sysdn.multinomial <- vglm(insure~age+male+nonwhite+as.factor(site), data=sysdn, multinomial)
sysdn.multinomial
margeff.sysdn.multinomial <- margeff(sysdn.multinomial)
rowMeans(margeff.sysdn.multinomial["age",,])  # marginal effects
# manual construction of ME
p <- fitted(sysdn.multinomial)  # p_ij (row `i' = individual, column `j' = choice) here, a 644x3 matrix
avgmargeff.age1 <- mean(p[,1]*(coef(sysdn.multinomial)["age:1"] - p[,2]*coef(sysdn.multinomial)["age:2"]- p[,1]*coef(sysdn.multinomial)["age:1"]))
avgmargeff.age2 <- mean(p[,2]*(coef(sysdn.multinomial)["age:2"] - p[,2]*coef(sysdn.multinomial)["age:2"]- p[,1]*coef(sysdn.multinomial)["age:1"]))
# note that insure="Uninsure" is the base category and the coefficient vector is zero
avgmargeff.age3 <- mean(p[,3]*(0 - p[,2]*coef(sysdn.multinomial)["age:2"]- p[,1]*coef(sysdn.multinomial)["age:1"]))
avgmargeff.age1 
avgmargeff.age2
avgmargeff.age3
Stata
webuse sysdsn1, clear
mlogit insure age male nonwhite i.site, base(3)
margins, dydx(age) predict(outcome(1)) // average marginal effects
margins, dydx(age) predict(outcome(2))
margins, dydx(age) predict(outcome(3))
I think this back-and-forth between languages and packages makes a good case for learning more than one programming languages. Somethings are more easily done in one or the other - a valuable bit of flexibility when working on a large project. And to round it off, here is a very good explanation of what margins are, quoted from [R], Stata's base reference manual:
What we call margins of responses are also known as predictive margins, adjusted predictions, and recycled predictions. When applied to balanced data, margins of responses are also called estimated marginal means and least-squares means. A margin is a statistic based on a fitted model calculated over a dataset in which some of or all the covariates are fixed at values different from what they really are. For instance, after a linear regression fit on males and females, the marginal mean (margin of mean) for males is the predicted mean of the dependent variable, where every observation is treated as if it represents a male; thus those observations that in fact do represent males are included, as well as those observations that represent females. The marginal mean for female would be similarly obtained by treating all observations as if they represented females.

EDIT: Thank you to Gautam C. for pointing out what happens to HTML style comments if they are in the CSS of an HTML file.

1 comment:

Ben said...

How would you figure the marginal effect for factor variables?