Monday, July 5, 2010

Fixed format data in Stata

I have never needed to use fixed-format ASCII data in Stata; I typically work with CSV files. However, I was looking for data to replicate this paper by Bill Greene on a general method to incorporate selectivity into limited dependent variable models.
I was about to write to him for the data, but decided to take a quick look through his website to see if he did provide the data. Turns out he provides a subset of that dataset, Table F25.1: Expenditure and Default Data, 1319 observations, as part of the example datasets of the 6th edition of his massively bestselling Econometric Analysis.
Small problem is that that dataset is a fixed-format text file and probably formatted as an Nlogit/Limdep dataset and reading it into Stata is not straightforward.
So, I decided to figure out how to write a dictionary file and read that data in using Stata's -infile- command. The dictionary file looks like this:

dictionary {
 _first(4)  * first line of data is the fourth
 _lines(3)  * there are three lines of data per observation
 _line(1)   * begin with line one of each observation
 Cardhldr "Dummy variable, 1 if application for credit card accepted, 0 if not"
 Majordrg "Number of major derogatory reports"
 Age  "Age n years plus twelfths of a year"
 Income  "Yearly income (divided by 10,000)"
 Exp_Inc  "Ratio of monthly credit card expenditure to yearly income"
 _newline   * move to the next line of an observation
 Avgexp  "Average monthly credit card expenditure"
 Ownrent  "1 if owns their home, 0 if rent"
 Selfempl "1 if self employed, 0 if not."
 Depndt  "1 + number of dependents"
 Inc_per  "Income divided by number of dependents"
 _newline  * move to the next (last) line of an observation
 Cur_add  "months living at current address"
 Major  "number of major credit cards held"
 Active  "number of active credit accounts"
Save this file as "limdep2stata.dct". Then, this dictionary file can be used to read in the data using a do-file which looks like this:
* Read in LIMDEP data in Stata
infile using limdep2stata.dct, using(TableF25-1.txt) clear 
renvars _all, lower
drop if cardhldr==.  // one extra line read in 
I guess the data file lacks an end-of-file delimiter and so an extra line is read in before Stata figures out that the file has ended. I will see if there is a simple solution to avoid this. But it does no harm and the extra line is easily dropped.
And that's it! You are good to go.
PS. I must mention that the command -renvars- is due to Nick Cox and Jeroen Weesie.

No comments: