Data Standards

File Names

All final .csv datasets are named using the following convention:

[Theme Abbreviation][2-digit number]_[Spatial Scale].csv

For example, the Policy theme dataset on Prison Incarceration Rates (PS01) at the county-level is PS01_C.csv. The same dataset at the state level is PS01_S.csv, at the tract-level would be PS01_T.csv, and at the zip code level would be PS01_Z.csv.

Theme Abbreviations:

  • Policy: PS
  • Health: Health*, Access*
  • Demographic: DS
  • Economic: EC
  • Physical Environment: BE
  • COVID-19: COVID

* Variables labeled “Health” include: Drug-Related Death rate, Hepatitis C, Physicians. Variables labeled “Access” include: Access to MOUDs, Health Centers, Hospitals, Mental Health Providers, Pharmacies, Substance Use Treatment Facilities, Opioid Treatment Programs.

Spatial Scales:

  • Tract: T
  • Zip/ZCTA: Z
  • County: C
  • State: S

Geographic Identifiers

All datasets have geographic identifiers included as a variable. We use the following labeling convention for each spatial scale.

VariableVariable IDDescription
StateSTATEFP2-digit State FIPS code
CountyCOUNTYFP5-digit County FIPS code (state + county)
ZIP Code/ZCTAZCTA5-digit assigned ZCTA
Census TractGEOID11-digit unique tract ID (state + county + tract)

Data Formatting

Watch for leading zeros. Some geographic identifiers for states, counties, zip codes, and tracts start with “0” or “00”; i.e. leading zeros. However, .csv and other text file formats drop leading zeros automatically upon opening. This means that a state FIPS code of “02” becomes “2”, a county code of “02004” becomes “2004”, a zip code of “07436” becomes “7436”, etc. If you are merging .csvs with any other data by their geographic identifier, you will need to add in the leading zeros (or conversely, drop the leading zeros in the other file) so that they match. This is particularly important when you are trying to merge shapefiles, such as the geographic boundary files, with the .csv files.

Most variable names are no more than 10 characters (with some exceptions) for ease of data wrangling with shapefiles and GIS software. Some variable names are therefore shortened or abbreviated from the source data.

Numeric data are rounded to the nearest tenth (two decimal places).

Missing data are represented as “NA” or empty, depending on the language or platform you are working with.These should not be mistaken for or confused with the numeric “0”.

Guidelines for Contributing

If you are interested in contributing to the OEPS, please keep in mind the following:

  • Variables names should be no more than 10 characters
  • Numeric observations should be rounded to the nearest tenth (two decimal places)
  • Remove any index columns
  • Remove quotations marks, commas, or other character punctuation
  • Code missing as unavailable data as NA or empty