Current Population Survey Food Security Supplement - Ingest, Wrangle, Visualize
Author
Dr. Roch Nianogo, Bowen Zhang, Dr. Hua Zhou
Code
# setup code: install packages, API keyslibrary(tidyverse)library(tidycensus)census_api_key("4cf445b70eabd0b297a45e7f62f52c27ba3b5cae",install =TRUE, overwrite =TRUE)Sys.setenv("CENSUS_KEY"="4cf445b70eabd0b297a45e7f62f52c27ba3b5cae")library(censusapi)library(gtsummary)library(maps)library(tigris)
1 Roadmap
A typical data science project:
In this tutorial, we ingest, tidy, transform, and visualize the 2021 Current Population Survey Food Security Supplement (CPS-FSS) data. We calculate the prevalence of food insecurity in the US and visualize the disparity in terms of geographic regions and other socio-economical factors.
Many datasets such as the Decennial Census since 2000 and the American Community Survey (ACS) are available through the US Census Bureau’s APIs and in turn accessible with tidycensus and related tools. However, the CPS-FSS data is not available through the tidycensus package, which only focuses on core datasets. Other R packages provide support for the wide range of datasets available from the Census Bureau and other government agencies. censusapi package, for example, allows for programmatic access to all US Census Bureau APIs.
# Set the Census API key in the environment
Sys.setenv("CENSUS_KEY" = "PUT YOUR KEY HERE")
2.2 censusapi package
censusapi is a lightweight package to get data from the U.S. It uses the same Census API key as tidycensus, though references it with the R environment variable CENSUS_KEY. If this environment variable is set in a user’s .Renviron file, functions in censusapi will pick up the key without having to supply it directly.
listCensusApis(): Get useful dataset metadata on all available APIs as a data frame.
listCensusMetadata(): Get information about a specific API as a data frame.
getCensus(): Retrieve Census data from a given API.
makeVarlist(): Use variable metadata to find variables containing a given string.
censusapi’s core function is getCensus(), which translates R code to Census API queries. The name argument references the API name; the censusapi documentation or the function listCensusApis() helps you understand how to format this.
2.2.1 List available APIs by listCensusApis()
To see a current table of every available endpoint, run listCensusApis():
censusapi::listCensusApis() |># convert data.frame to tibbleas_tibble() |># only keep the columns neededselect(title, name, vintage, type, temporal, url) |>print()
# A tibble: 1,611 × 6
title name vintage type temporal url
<chr> <chr> <int> <chr> <chr> <chr>
1 Current Population Survey: Basic Monthly cps/… 2024 Micr… 2024-04… http…
2 Current Population Survey: Basic Monthly cps/… 2024 Micr… 2024-02… http…
3 Current Population Survey: Basic Monthly cps/… 2024 Micr… 2024-01… http…
4 Current Population Survey: Basic Monthly cps/… 2024 Micr… 2024-03… http…
5 Current Population Survey: Basic Monthly cps/… 2024 Micr… 2024-05… http…
6 Current Population Survey Annual Social a… cps/… 2023 Micr… 2023-03… http…
7 Current Population Survey: Basic Monthly cps/… 2023 Micr… 2023-04… http…
8 Current Population Survey: Basic Monthly cps/… 2023 Micr… 2023-08… http…
9 Current Population Survey: Basic Monthly cps/… 2023 Micr… 2023-12… http…
10 Current Population Survey: Basic Monthly cps/… 2023 Micr… 2023-02… http…
# ℹ 1,601 more rows
listCensusApis() returns a dataframe that includes: title, description, name, vintage, url, dataset type, and other useful fields. Search for the dataset you are interested in.
Now we are interested in Food Security Supplement. We can search for the keyword “Food Security” in the column title of the table above, and see which year of data is available.
title name vintage
1 Current Population Survey: Food Security Supplement cps/foodsec/dec 2022
2 Current Population Survey: Food Security Supplement cps/foodsec/dec 2021
3 Current Population Survey: Food Security Supplement cps/foodsec/dec 2020
4 Current Population Survey: Food Security Supplement cps/foodsec/dec 2019
type
1 Microdata
2 Microdata
3 Microdata
4 Microdata
description
1 Provides data that will measure hunger and food security. It will provide data on food expenditure, access to food, and food quality and safety.
2 Provides data that will measure hunger and food security. It will provide data on food expenditure, access to food, and food quality and safety.
3 Provides data that will measure hunger and food security. It will provide data on food expenditure, access to food, and food quality and safety.
4 Provides data that will measure hunger and food security. It will provide data on food expenditure, access to food, and food quality and safety.
2.2.2 Metadata for a specific API by listCensusMetadata()
Get the metadata for the CPS Food Security Supplement December 2021 Public-Use Microdata File
# A tibble: 512 × 9
name label concept predicateType group limit predicateOnly suggested_weight
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 for Cens… Census… fips-for N/A 0 TRUE <NA>
2 in Cens… Census… fips-in N/A 0 TRUE <NA>
3 ucgid Unif… Census… ucgid N/A 0 TRUE <NA>
4 PEEDU… Demo… <NA> int N/A 0 <NA> PWSSWGT
5 PUBUS1 Labo… <NA> int N/A 0 <NA> PWCMPWGT
6 PRCOW1 Indu… <NA> int N/A 0 <NA> PWCMPWGT
7 HETEL… Hous… <NA> int N/A 0 <NA> HWHHWGT
8 PRCOW2 Indu… <NA> int N/A 0 <NA> PWCMPWGT
9 PEERN… Earn… <NA> int N/A 0 <NA> PWORWGT
10 HES9 Scre… <NA> int N/A 0 <NA> HHSUPWGT
# ℹ 502 more rows
# ℹ 1 more variable: is_weight <chr>
2.2.3 Request data by getCensus()
The example below makes a request to API for the CPS Food Security Supplement December 2021 Public-Use Microdata File. The name argument is set to “cps/foodsec/dec” and the vintage argument is set to 2021. If we are interested in Household Food Security Scale, then for the vars argument, we should include the following terms:
HRHHID: Household ID
HRHHID2: Household ID (Part 2)
PERRP: Relationship to Reference Person
GESTFIPS: State FIPS Code
GTCO: County Code
HRFS12M1: Summary Food Security Status, 12-Month
1 = Food Secure High or Marginal Food Security
2 = Low Food Security
3 = Very Low Food Security
-1 = Not in Universe (In this variable, not interviewed)
-9 = No Response
HHSUPWGT: Household Supplemental Weight
fss_21_status <- censusapi::getCensus(name ="cps/foodsec/dec",vintage =2021,# vars is requiredvars =c("HRHHID", "HRHHID2", "PERRP","GESTFIPS", "GTCO", "HRFS12M1","HHSUPWGT") ) |>as_tibble() |>print()
Notice that some columns are not in the format we want. For example, HRFS12M1 (Food Security Status) should be a categorical variable, and we want to convert it to a factor with meaningful labels. Also, HHSUPWGT (Household Supplemental Weight) was ingested as a character variable, and we want to convert it to a numeric variable.
In addition to checking the document and encoding labels manually, listCensusMetadata() offers a way to get the value labels of specific variables. This can be useful for understanding the meaning of variables and their values.
# A tibble: 71,571 × 7
HRHHID HRHHID2 PERRP GESTFIPS GTCO HRFS12M1 HHSUPWGT
<chr> <chr> <dbl> <chr> <chr> <fct> <dbl>
1 000005185410966 14011 41 4 27 Food Secure High or Ma… 20706.
2 000008178510165 13012 41 5 0 Food Secure High or Ma… 1754.
3 000013041104291 13011 40 13 139 Food Secure High or Ma… 6486.
4 000013041104291 13011 42 13 139 Food Secure High or Ma… 6486.
5 000013041104291 13011 48 13 139 Food Secure High or Ma… 6486.
6 000013941103291 12011 40 13 139 Food Secure High or Ma… 6284.
7 000013941103291 12011 42 13 139 Food Secure High or Ma… 6284.
8 000015897210171 13011 40 2 0 Food Secure High or Ma… 651.
9 000015897210171 13011 42 2 0 Food Secure High or Ma… 651.
10 000016756309781 12011 41 36 119 Food Secure High or Ma… 5615.
# ℹ 71,561 more rows
3 Constructing household characteristics from person records
To compute some household characteristics (such as household size, presence of children, or presence of elderly members), it is necessary to identify the records of all persons in the same household. Households within the December CPS-FSS are uniquely and completely identified by two household identifiers in combination, HRHHID and HRHHID2. Characteristics of the household reference person can be assigned from the person record with PERRP 40 or 41, which will always be the record with the lowest-numbered PERRP in the household.
HRFS12M1 (Summary Food Security Status, 12-Month) is one of the household characteristics. This is the variable used for most food security statistics in USDA’s annual food security report series. In order to compute the prevalence of food insecurity, we need to aggregate the food security status of all persons in the same household. tbl_summary() in the gtsummary package can generate beautiful summary tables.
fss_21_status <- fss_21_status |># # Filter observations with PERRP 40 or 41 (reference person in the household) # filter(PERRP %in% c(40, 41)) |># # Arrange by PERRP# arrange(PERRP) |># Keep the record with the lowest-numbered PERRP in the householdgroup_by(HRHHID, HRHHID2) |>slice_min(PERRP, n =1) |>ungroup() |>print()
# A tibble: 30,343 × 7
HRHHID HRHHID2 PERRP GESTFIPS GTCO HRFS12M1 HHSUPWGT
<chr> <chr> <dbl> <chr> <chr> <fct> <dbl>
1 000005185410966 14011 41 4 27 Food Secure High or Ma… 20706.
2 000008178510165 13012 41 5 0 Food Secure High or Ma… 1754.
3 000013041104291 13011 40 13 139 Food Secure High or Ma… 6486.
4 000013941103291 12011 40 13 139 Food Secure High or Ma… 6284.
5 000015897210171 13011 40 2 0 Food Secure High or Ma… 651.
6 000016756309781 12011 41 36 119 Food Secure High or Ma… 5615.
7 000017986207521 13011 40 27 0 Low Food Security 4323.
8 000018385405241 14011 40 48 0 Food Secure High or Ma… 5896.
9 000056133206001 13011 40 26 81 Food Secure High or Ma… 5031.
10 000067043406081 12011 40 47 0 Food Secure High or Ma… 5780.
# ℹ 30,333 more rows
# Display number and proportion of households in different food security statusfss_21_status |>select(HRFS12M1) |> gtsummary::tbl_summary()
Characteristic
N = 30,3431
HRFS12M1
Very Low Food Security
1,093 (3.6%)
Food Secure High or Marginal Food Security
27,357 (90%)
No Response
49 (0.2%)
Low Food Security
1,844 (6.1%)
1 n (%)
Exercise: display the bar plot of variable HRFS12M1.
# geom_bar
Notice that there are 27,357 + 1,844 + 1,093 + 49 = 30,343 households who attended the interview for Food Security Supplement in 2021, which matches the number in the technical documentation.
The CPS is a complex probability sample, and interviewed households, as well as persons in those households, are assigned weights so that the full interviewed sample represents the total national non-institutionalized civilian population. Initial weights are assigned based on probability of selection into the sample, and weights are then adjusted iteratively to match population controls for selected demographic characteristics at State and national levels. There are two sets of household and person weights in this data file: (1) labor force survey weights, and (2) Food Security Supplement weights.
We can use makeVarlist() function in censusapi package to get the list of variables in the dataset. In addition to the description and the type of each variable, we can also check column suggested_weight to see which weight should be used for the analysis.
where \(L_i\) is an indicator variable that equals 1 if household \(i\) is in low food security or very low food security, and 0 otherwise; \(H_i\) is an indicator variable that equals 1 if household \(i\) attended the interview for Food Security Supplement and had a response to this question, and 0 otherwise; and \(w_i\) is the weight of household \(i\).
# percentage of households with low or very low food security in 2021 sum(fss_21_status$HRFS12M1_low) /sum(fss_21_status$HRFS12M1_res) *100
[1] 10.23089
The percentage of households with low food security or very low food security in 2021 is 10.23%. We can verify this result by comparing this number to the USDA Economic Research Report Household Food Security in the United States in 2021 Figure 1.
Since we have already computed the rate of low food security in 2021. We can further investigate the rate of low food security in each state.
We first need to match the FIPS code to the state names using fips_codes() function in tigris package.
# Get the state namesstate_names <- tigris::fips_codes |>select(state, state_code, state_name) |>distinct() |>as_tibble() |>print()
# A tibble: 57 × 3
state state_code state_name
<chr> <chr> <chr>
1 AL 01 Alabama
2 AK 02 Alaska
3 AZ 04 Arizona
4 AR 05 Arkansas
5 CA 06 California
6 CO 08 Colorado
7 CT 09 Connecticut
8 DE 10 Delaware
9 DC 11 District of Columbia
10 FL 12 Florida
# ℹ 47 more rows
Now we can compute the rate of low food security in each state.
fss_21_status_state <- fss_21_status |># Group by state FIPS code and calculate the rate by stategroup_by(GESTFIPS) |>summarize(HRFS12M1_low =sum(HRFS12M1_low),HRFS12M1_res =sum(HRFS12M1_res)) |># Notice that state_code is a two digit number# But in fss_21_status, it is single digit if the state code is less than 10mutate(GESTFIPS =ifelse(as.numeric(GESTFIPS) <10,str_c("0", GESTFIPS),as.character(GESTFIPS))) |># Left join with state namesleft_join(state_names, by =c("GESTFIPS"="state_code")) |>mutate(low_food_security_rate = HRFS12M1_low / HRFS12M1_res *100,state_name =str_to_lower(state_name)) |>print()
# A tibble: 51 × 6
GESTFIPS HRFS12M1_low HRFS12M1_res state state_name low_food_security_rate
<chr> <dbl> <dbl> <chr> <chr> <dbl>
1 01 243182. 2095336. AL alabama 11.6
2 10 52513. 399009. DE delaware 13.2
3 11 23157. 342487. DC district of … 6.76
4 12 918419. 9217306. FL florida 9.96
5 13 331508. 4233220. GA georgia 7.83
6 15 39073. 469880. HI hawaii 8.32
7 16 67953. 717091. ID idaho 9.48
8 17 488714. 5129502. IL illinois 9.53
9 18 249582. 2730181. IN indiana 9.14
10 19 92188. 1344716. IA iowa 6.86
# ℹ 41 more rows
Then we can visualize the rate of low food security in each state.
ggplot2::map_data("state") |>merge(fss_21_status_state, by.x ="region",by.y ="state_name", all.x =TRUE) |>ggplot(aes(x = long, y = lat, group = group,fill = low_food_security_rate)) +geom_polygon(color ="black") +scale_fill_gradient(low ="lightblue", high ="darkblue",name ="Households in Low Food Security (%)") +labs(title ="Percentage of Households in Low Food Security by State") +theme_minimal() +theme(panel.grid =element_blank(),axis.text =element_blank(),axis.title =element_blank(),legend.position ="bottom") +coord_fixed(ratio =1.5)
4 Ingest from other data sources by R
In R, we can easily ingest data in other formats to enrich the analysis.
CSV and other text delimited data files (.csv, .csv.gz): readr package in tidyverse.
Excel files (.xls, .xlsx): readxl package in tidyverse. tidyxl package for reading non-tabular data from Excel.
Bigger-than-memory data files: arrow and duckdb packages.
Big data (that cannot fit into a single computer): sparklyr package.
Databases (SQLite, MySQL, PostgreSQL, BigQuery, etc): DBI package plus a database-specific backend package.
Calculate the percentage of LA County households in low food security in 2021.
Code
# Find the State Code for California and the County Code for Los Angeles Countystate_names <- tigris::fips_codes |>as_tibble() |>filter( state_name =="California", county =="Los Angeles County" ) |>print()
Code
fss_21_status |># Filter for Los Angeles Countyfilter(GESTFIPS =="6"& GTCO =="37") |>summarize(fd_insec_rate =sum(HRFS12M1_low) /sum(HRFS12M1_res)) |>print()
How does COVID impact the food security status of households in the U.S.? Does the impact differ by state, race, household income, or other social-economic determinants?