Monday, July 5, 2010

Fixed format data in Stata

I have never needed to use fixed-format ASCII data in Stata; I typically work with CSV files. However, I was looking for data to replicate this paper by Bill Greene on a general method to incorporate selectivity into limited dependent variable models.
I was about to write to him for the data, but decided to take a quick look through his website to see if he did provide the data. Turns out he provides a subset of that dataset, Table F25.1: Expenditure and Default Data, 1319 observations, as part of the example datasets of the 6th edition of his massively bestselling Econometric Analysis.
Small problem is that that dataset is a fixed-format text file and probably formatted as an Nlogit/Limdep dataset and reading it into Stata is not straightforward.
So, I decided to figure out how to write a dictionary file and read that data in using Stata's -infile- command. The dictionary file looks like this:

dictionary {
 _first(4)  * first line of data is the fourth
 _lines(3)  * there are three lines of data per observation
 
 _line(1)   * begin with line one of each observation
 Cardhldr "Dummy variable, 1 if application for credit card accepted, 0 if not"
 Majordrg "Number of major derogatory reports"
 Age  "Age n years plus twelfths of a year"
 Income  "Yearly income (divided by 10,000)"
 Exp_Inc  "Ratio of monthly credit card expenditure to yearly income"
 
 _newline   * move to the next line of an observation
 Avgexp  "Average monthly credit card expenditure"
 Ownrent  "1 if owns their home, 0 if rent"
 Selfempl "1 if self employed, 0 if not."
 Depndt  "1 + number of dependents"
 Inc_per  "Income divided by number of dependents"
 
 _newline  * move to the next (last) line of an observation
 Cur_add  "months living at current address"
 Major  "number of major credit cards held"
 Active  "number of active credit accounts"
}
Save this file as "limdep2stata.dct". Then, this dictionary file can be used to read in the data using a do-file which looks like this:
/*
* Read in LIMDEP data in Stata
*/
infile using limdep2stata.dct, using(TableF25-1.txt) clear 
renvars _all, lower
drop if cardhldr==.  // one extra line read in 
I guess the data file lacks an end-of-file delimiter and so an extra line is read in before Stata figures out that the file has ended. I will see if there is a simple solution to avoid this. But it does no harm and the extra line is easily dropped.
And that's it! You are good to go.
PS. I must mention that the command -renvars- is due to Nick Cox and Jeroen Weesie.

No comments: