## Stata cheat sheets

with Tim Essam## Background

When teaching an intro class on Stata, we realized that there were no good reference materials on Stata. What started off as a “let’s make a quick cheat sheet for the basic functions” quickly evolved into a comprehensive set of 6 cheat sheets on the common data wrangling and analysis functions within Stata.

## Solution

After cataloguing the most common functions, we organized them into six basic functional areas: basic data processing, data manipulation, data visualization, visualization customization, basic analysis, and basic programming. Then came the tricky part: how are all these functions related? What’s the underlying logical and organizational framework? After sketching out these relationships, we created the layouts in Adobe Illustrator, heavily inspired by Rstudio’s amazing R cheat sheets.

#### Data Processing

- basic Stata syntax for all functions
- basic math and logic operations
- setting up working directories and log files
- importing data
`use`

`import excel`

- converting between data types
- exploring data files
`codebook`

`summarize`

- summarizing and collapsing data in tables
`tabulate`

`collapse`

- creating new variables
`generate`

`egen`

#### Data Transformation

- subsetting data
`drop`

`keep`

- replacing data
`rename`

`replace`

`recode`

- using variable and value labels
`label define`

`label list`

- reshaping data (melting and casting)
`reshape`

- merging and appending
`append`

`merge`

- fuzzy-matching

- string transformations
- saving and exporting data
`save`

`export excel`

#### Data Visualization

- small multiples
- one variable visualizations
`histogram`

`kdensity`

: smoothed histogram`graph bar`

: bar plot`graph dot`

: dot plot`graph hbox`

: box and whiskers

- two variable visualizations
`tw scatter`

: scatter plot`tw connected`

: line plot`tw area`

: area plot`two pcspike`

: parallel coordinates plot`tw pccapsym`

: slope/bump chart

- three variable visualizations
`plotmatrix`

: heatmap

- plotting with summarization or fitting
`binscatter`

: plot summary value`tw lfitci`

: linear fit`tw lowess`

: lowess smoothing

- plotting regression results
`coefplot`

: regression coefficients`marginsplot`

: marginal effects

- Changing marks
- symbology
- lines
- text

- Changing channels
- size
- color
- shape
- position

- Using themes
- Saving plots

#### Data Analysis

- declaring data as a special type
- time series
- survival analysis
- longitudinal/panel
- survey

- summarizing data, correlations, point estimates, etc.
`summarize`

`pwcorr`

- statistical tests
- t-tests, ANOVAs, proportions, distributions, etc.

- estimating models
`regress`

`logit`

- delaring interactions within model

- evaluating models
- postestimation calculations (use model for something)
`predict`

#### Programming

- fundamental data types
- scalars
- matrices
- macros

- accessing stored results
`return`

: r-class objects`e-return`

: e-class objects

- loops
`foreach`

`forvalues`

- additional programming resources: using github in Stata