Tuesday, June 29, 2010

Multinomial logit in Stata and R II

A couple more interesting translations:


  1. If you wanted to change the base outcome in the estimation, the multinom() function in nnet is no longer sufficient and we use the mlogit package. Note that this package is more in line with the motivation of multinomial logit as a choice model and thus expects the data in a form where for each choice occasion, all choices are enumerated as separate rows in the data.
    Stata
    webuse sysdsn1, clear
    mlogit insure age male nonwhite i.site, base(2)
    

    R
    library(effects)
    library(foreign)
    library(mlogit)
    # using the nnet package
    sysdn <- read.dta("D:/programming/r/sandbox/sysdn1.dta", convert.factors=TRUE)
    sysdn.multinom <- multinom(insure~age+male+nonwhite+as.factor(site), data=sysdn)
    summary(sysdn.multinom)
    # change the base category of the response
    sysdn$site <- as.factor(sysdn$site)
    sysdn$insure <- as.factor(sysdn$insure)
    sysdn.mldata <- mlogit.data(sysdn, varying=NULL, choice="insure", shape="wide")
    sysdn.mlogit <- mlogit(insure ~ 1|age+male+nonwhite+site, data=sysdn.mldata, reflevel="Indemnity") 
    sysdn.mlogit2 <- mlogit(insure ~ 1|age+male+nonwhite+site, data=sysdn.mldata, reflevel="Prepaid") 
    summary(sysdn.mlogit)
    summary(sysdn.mlogit2)
    

  2. You might want to predict (identified statistics) using the estimated model at various values of the covariates. In Stata 11, the new -margins- command offers a very general framework for doing this. I would strongly recommend reading [M] margins (the pdf documentation) for a very good description of what can be achieved using -margins-. Let us see is we can encode losslessly to R. I use the example I found here.
    R
    # calculate predicted probabilities
    mydata <- read.csv(url("http://www.ats.ucla.edu/stat/r/dae/mlogit.csv"))
    attach(mydata)
    names(mydata)
    mlogit.model<- mlogit(brand~1|female+age, data = mldata, reflevel="1")
    summary(mlogit.model)  
    newdata <- data.frame(cbind(age = rep(24:38, 2), female = c(rep(0, 15), rep(1, 15))))
    logit1 <- rep(0, 30)
    logit2 <- -11.774655 + 0.523814*newdata$female + 0.368206*newdata$age
    logit3 <- -22.721396 + 0.465941*newdata$female + 0.685908*newdata$age
    logits <- cbind(logit1, logit2, logit3)
    p.unscaled <- exp(logits)
    p <- cbind(newdata, (p.unscaled / rowSums(p.unscaled)))
    colnames(p) <- c("age", "female", "pred.1", "pred.2", "pred.3")
    p
    
    Stata
    insheet using http://www.ats.ucla.edu/stat/r/dae/mlogit.csv, comma clear
    mlogit brand female age, base(1)
    margins, predict(outcome(1)) at(age=(24(1)38) female=(0 1))
    margins, predict(outcome(2)) at(age=(24(1)38) female=(0 1))
    margins, predict(outcome(3)) at(age=(24(1)38) female=(0 1))
    
    The R code is more cumbersome, but it has the advantage of laying bare the structure of the construction, which is very valuable while learning about these models.

No comments: