ECON 341: Crash introduction to STATA

In this crash introduction to STATA you will learn the basics of this statistical package. After reading this tutorial you must be abe to load your data, create relevant variables, run simple OLS regressions, and access to the results stored by STATA so you can use them to perform different calculations.

I believe that the best and more fun way of learning a new programming language is to do it yourself. Therefore, I've attached sample do-files (you will soon learn what this is) with comments in each line of code. You must run this do-files and then you can change lines of codes to experiment with the new commands.

Once you feel more comfortable with STATA, at the end of the tutorial there is a list of websites/books that have nice and very complete resources for learning more about this package.

Have fun!


You can download the files used in this tutorial in each section or you can download a zip file containing all files here.


The STATA environment

Click here to see the STATA windows that we're going to be using.

If you do not have STATA, use the computers in the third floor at Social Sciences Building. Check https://help.econ.duke.edu/wiki/doku.php for information about the computing services available at the Economics Department.


The do-file

To use STATA you may type each command at a time in the command window, or you can use a do-file. A do-file is just an ASCII file (plain text file with no special characters) containing Stata commands that you create with the STATA editor or any word editor. When you interactively type do filename at the command line, the contents of file are executed just as if you typed each line of the commands in the do-file at the command line.

Tip: Create a directory where you will download the do files used in this tutorial. Then, in the command line of STATA change the directory path. In my case I use:

cd c:\data\econ341
Example The do-file hello.do contains the following lines:

display "Hello, world"
You run this do-file by interactively typing

. do hello			<-you type this
. display "Hello, world"	<-Stata types this
which displays the following in the Stata Results window

Hello, world
.
end of do-file
.				
This simple experiment is worth trying for yourself.

Organizing do-files

When using STATA you will create a number of do-files (depending on the complexity of your project) and you may need a few rules to keep your work organized. Here are a few rules I found useful:
    Rule 1:
    There is one directory, and one directory only, for a project. I know where to look for my data and my results.

    Rule 2:
    All my do-files create logs -- the code is in them to do that. I do not have to remember to start a log before running them.

    Rule 3:
    My individual do-files typically -- not always -- start with the letters cr or an -- these have a special meaning to me. cr*.do files create datasets. crxyz.do creates the dataset xyz.dta. an*.do files perform some sort of analysis.

    Rule 4:
    Create a do-file named master.do. In this file you execute all the do-files that comprise your project in an orderly manner.

    Rule 5:
    Once a do-file works and its name is inserted into master.do, it is never again edited. Absolute compliance with this rule is what guarantees that typing do master will recreate what I have done.

    Rule 6:
    Comment your do-file. You can place comments on your do file by adding lines like /* My Comment Here */. This will help you remember what you meant to do the time you go back to your do-file.

There are reasons behind these rules that I want to explain to you as I illustrate them. This will help you develop rules of thumb to guide your own behavior.


An individual do-file

An individual do-file typically looks like this:

    BEGIN X.do
    
    capture log close
    log using X, replace
    set more off
    
    	program
    	code
    	in
    	here
    		
    log close
    exit
    
    END X.do

Often when I work I have a log going. The log file stores all the output from the commands executed in your do-file (e.g., regression results). Nevertheless, if I run one of my official do-files -- say X.do -- I want its log to be saved in X.log. Thus, my do-files start by closing any open log. Since a log might not be open and log close might generate an error, I place a capture in front of the log close.

When I run my individual do-files, I do not want Stata to pause on --more-- conditions. I set more off. When the do-file completes, Stata will automatically reset it to whatever it was originally.

Finally, at the end of my do-file, I close the log.


Data management

In this tutorial we will use the file states.out which contains educational data on U.S. states and DC. The file is in tab-separated format which is a type of file STATA reads.

Tip: You must convert your excel spreadsheet to a tab-separated text file to read it into STATA.

The do-file crdata.do:

  • loads states.out data file and create a data set in STATA format, states.dta.
  • creates some variables that might be of our interest
  • presents s

The following commands are introduced in this do-file

  • insheet Reads data set in tab-separated format to STATA.
  • generate creates new variables.
  • label variable gives a label to variables in the data set.
  • keep tells STATA which variables to keep in the data set.
  • list list an specified number of observations on the screen
  • summarize gives a table of summary statistics of the variables in the data set.

Tip: To obtain the full syntax of any command, say insheet, type:
help insheet
and STATA will display a window with all information about that command.


Linear Regression

Now we have our data set in STATA format and we created some variables that we want to use in our regression model.

The do-file anols.do:

  • runs a simple regression (OLS)
  • shows how to recover coefficients stored in the memory
  • does simple hypothesis testing

One important command used in this example is:
. ereturn list
The estimation command regress leaves results in e(). These returned values remain available until the next estimation command is executed. e() acts as a function that returns the value of the named estimation result from the last estimation command. You can see what is available by typing ereturn list

We will also need to recover OLS coefficients. These are stored in in
_b[name_of_var].

The following commands are also introduced in this do-file

  • use Reads data set in STATA format.
  • regress calculates OLS regression.


Further readings and resources

Now that you know a little bit more about STATA you may want to learn much more. Try these sites:

or give a look at this book: