ECON 341: Crash introduction to STATA
In this crash introduction to STATA you will learn the basics of this statistical package. After reading this tutorial you must be abe to load your data, create relevant variables, run simple OLS regressions, and access to the results stored by STATA so you can use them to perform different calculations.
I believe that the best and more fun way of learning a new programming language is to do it yourself. Therefore, I've attached sample do-files (you will soon learn what this is) with comments in each line of code. You must run this do-files and then you can change lines of codes to experiment with the new commands.
Once you feel more comfortable with STATA, at the end of the tutorial there is a list of websites/books that have nice and very complete resources for learning more about this package.
Have fun!
You can download the files used in this tutorial in each section or you can download a zip file containing all files here.
The STATA environment
Click here to see the STATA windows that we're going to be using.
If you do not have STATA, use the computers in the third floor at Social Sciences Building. Check https://help.econ.duke.edu/wiki/doku.php for information about the computing services available at the Economics Department.
The do-file
To use STATA you may type each command at a time in the command window, or you can use a do-file. A do-file is just an ASCII file (plain text file with no special characters) containing Stata commands that you create with the STATA editor or any word editor. When you interactively type do filename at the command line, the contents of file are executed just as if you typed each line of the commands in the do-file at the command line.
Example The do-file hello.do contains the following lines:Tip: Create a directory where you will download the do files used in this tutorial. Then, in the command line of STATA change the directory path. In my case I use:
cd c:\data\econ341
display "Hello, world"
You run this do-file by interactively typing
. do hello <-you type this . display "Hello, world" <-Stata types thiswhich displays the following in the Stata Results window
Hello, world . end of do-file .This simple experiment is worth trying for yourself.
Organizing do-files
When using STATA you will create a number of do-files (depending on the complexity of your project) and you may need a few rules to keep your work organized. Here are a few rules I found useful:| Rule 1: There is one directory, and one directory only, for a project. I know where to look for my data and my results. Rule 2: Rule 3: Rule 4: Rule 5: Rule 6: |
An individual do-file
An individual do-file typically looks like this:
| BEGIN X.do |
capture log close log using X, replace set more off program code in here log close exit |
| END X.do |
Often when I work I have a log going. The log file stores all the output from the commands executed in your do-file (e.g., regression results). Nevertheless, if I run one of my official do-files -- say X.do -- I want its log to be saved in X.log. Thus, my do-files start by closing any open log. Since a log might not be open and log close might generate an error, I place a capture in front of the log close.
When I run my individual do-files, I do not want Stata to pause on --more-- conditions. I set more off. When the do-file completes, Stata will automatically reset it to whatever it was originally.
Finally, at the end of my do-file, I close the log.
Data management
In this tutorial we will use the file states.out which contains educational data on U.S. states and DC. The file is in tab-separated format which is a type of file STATA reads.
Tip: You must convert your excel spreadsheet to a tab-separated text file to read it into STATA.
The do-file crdata.do:
- loads states.out data file and create a data set in STATA format, states.dta.
- creates some variables that might be of our interest
- presents s
The following commands are introduced in this do-file
-
insheetReads data set in tab-separated format to STATA. -
generatecreates new variables. -
label variablegives a label to variables in the data set. -
keeptells STATA which variables to keep in the data set. -
listlist an specified number of observations on the screen -
summarizegives a table of summary statistics of the variables in the data set.
Tip: To obtain the full syntax of any command, say
insheet, type:
help insheet
and STATA will display a window with all information about that command.
Linear Regression
Now we have our data set in STATA format and we created some variables that we want to use in our regression model.The do-file anols.do:
- runs a simple regression (OLS)
- shows how to recover coefficients stored in the memory
- does simple hypothesis testing
One important command used in this example is:
. ereturn list
The estimation command regress leaves results in e(). These returned values remain available until the next estimation command is executed. e() acts as a function that returns the value of the named estimation result from the last estimation command. You can see what is available by typing
ereturn list
We will also need to recover OLS coefficients. These are stored in in _b[name_of_var].
The following commands are also introduced in this do-file
-
useReads data set in STATA format. -
regresscalculates OLS regression.
Further readings and resources
Now that you know a little bit more about STATA you may want to learn much more. Try these sites:
or give a look at this book: