Background

When teaching an intro class on Stata, we realized that there were no good reference materials on Stata. What started off as a “let’s make a quick cheat sheet for the basic functions” quickly evolved into a comprehensive set of 6 cheat sheets on the common data wrangling and analysis functions within Stata.


Solution

After cataloguing the most common functions, we organized them into six basic functional areas: basic data processing, data manipulation, data visualization, visualization customization, basic analysis, and basic programming. Then came the tricky part: how are all these functions related? What’s the underlying logical and organizational framework? After sketching out these relationships, we created the layouts in Adobe Illustrator, heavily inspired by Rstudio’s amazing R cheat sheets.


Data Processing

  • basic Stata syntax for all functions
  • basic math and logic operations
  • setting up working directories and log files
  • importing data
    • use
    • import excel
  • converting between data types
  • exploring data files
    • codebook
    • summarize
  • summarizing and collapsing data in tables
    • tabulate
    • collapse
  • creating new variables
    • generate
    • egen


Data Transformation

  • subsetting data
    • drop
    • keep
  • replacing data
    • rename
    • replace
    • recode
  • using variable and value labels
    • label define
    • label list
  • reshaping data (melting and casting)
    • reshape
  • merging and appending
    • append
    • merge
    • fuzzy-matching
  • string transformations
  • saving and exporting data
    • save
    • export excel


Data Visualization

  • small multiples
  • one variable visualizations
    • histogram
    • kdensity: smoothed histogram
    • graph bar: bar plot
    • graph dot: dot plot
    • graph hbox: box and whiskers
  • two variable visualizations
    • tw scatter: scatter plot
    • tw connected: line plot
    • tw area: area plot
    • two pcspike: parallel coordinates plot
    • tw pccapsym: slope/bump chart
  • three variable visualizations
    • plotmatrix: heatmap
  • plotting with summarization or fitting
    • binscatter: plot summary value
    • tw lfitci: linear fit
    • tw lowess: lowess smoothing
  • plotting regression results
    • coefplot: regression coefficients
    • marginsplot: marginal effects
  • Changing marks
    • symbology
    • lines
    • text
  • Changing channels
    • size
    • color
    • shape
    • position
  • Using themes
  • Saving plots


Data Analysis

  • declaring data as a special type
    • time series
    • survival analysis
    • longitudinal/panel
    • survey
  • summarizing data, correlations, point estimates, etc.
    • summarize
    • pwcorr
  • statistical tests
    • t-tests, ANOVAs, proportions, distributions, etc.
  • estimating models
    • regress
    • logit
    • delaring interactions within model
  • evaluating models
  • postestimation calculations (use model for something)
    • predict


Programming

  • fundamental data types
    • scalars
    • matrices
    • macros
  • accessing stored results
    • return: r-class objects
    • e-return: e-class objects
  • loops
    • foreach
    • forvalues
  • additional programming resources: using github in Stata