## Background

When teaching an intro class on Stata, we realized that there were no good reference materials on Stata. What started off as a “let’s make a quick cheat sheet for the basic functions” quickly evolved into a comprehensive set of 6 cheat sheets on the common data wrangling and analysis functions within Stata.

## Solution

After cataloguing the most common functions, we organized them into six basic functional areas: basic data processing, data manipulation, data visualization, visualization customization, basic analysis, and basic programming. Then came the tricky part: how are all these functions related? What’s the underlying logical and organizational framework? After sketching out these relationships, we created the layouts in Adobe Illustrator, heavily inspired by Rstudio’s amazing R cheat sheets.

#### Data Processing

• basic Stata syntax for all functions
• basic math and logic operations
• setting up working directories and log files
• importing data
• use
• import excel
• converting between data types
• exploring data files
• codebook
• summarize
• summarizing and collapsing data in tables
• tabulate
• collapse
• creating new variables
• generate
• egen

#### Data Transformation

• subsetting data
• drop
• keep
• replacing data
• rename
• replace
• recode
• using variable and value labels
• label define
• label list
• reshaping data (melting and casting)
• reshape
• merging and appending
• append
• merge
• fuzzy-matching
• string transformations
• saving and exporting data
• save
• export excel

#### Data Visualization

• small multiples
• one variable visualizations
• histogram
• kdensity: smoothed histogram
• graph bar: bar plot
• graph dot: dot plot
• graph hbox: box and whiskers
• two variable visualizations
• tw scatter: scatter plot
• tw connected: line plot
• tw area: area plot
• two pcspike: parallel coordinates plot
• tw pccapsym: slope/bump chart
• three variable visualizations
• plotmatrix: heatmap
• plotting with summarization or fitting
• binscatter: plot summary value
• tw lfitci: linear fit
• tw lowess: lowess smoothing
• plotting regression results
• coefplot: regression coefficients
• marginsplot: marginal effects
• Changing marks
• symbology
• lines
• text
• Changing channels
• size
• color
• shape
• position
• Using themes
• Saving plots

#### Data Analysis

• declaring data as a special type
• time series
• survival analysis
• longitudinal/panel
• survey
• summarizing data, correlations, point estimates, etc.
• summarize
• pwcorr
• statistical tests
• t-tests, ANOVAs, proportions, distributions, etc.
• estimating models
• regress
• logit
• delaring interactions within model
• evaluating models
• postestimation calculations (use model for something)
• predict

#### Programming

• fundamental data types
• scalars
• matrices
• macros
• accessing stored results
• return: r-class objects
• e-return: e-class objects
• loops
• foreach
• forvalues
• additional programming resources: using github in Stata