This vignette is a brief introduction to the package including its installation and making some basic queries.

Introduction

lehdr is an R package that allows users to draw Longitudinal and Employer Household Dynamics Origin-Destination Employment Statistics (LODES) datasets returned as dataframes. The LODES dataset forms the backbone of the US Census’s OntheMap web app that allows users to track changing spatial employment patterns at a fine geographic scale. While OnTheMap is useful, it is a limited tool that does not easily allow comparisons over time or across geographies. This package exists to make querying the tables that form the OnTheMap easier for urban researchers and practitioners, such as transportation and economic development planners and disaster preparedness professionals.

Installation

lehdr has not yet been submitted to CRAN so installing using devtools is required. Additionally, we’ll be using dplyr.

#install.packages(setdiff(c("ggplot2", "stringr", "dplyr", "devtools"), rownames(installed.packages())),
#                 repos="http://cran.rstudio.com")
library(dplyr)
library(stringr)
library(devtools)

devtools::install_github("jamgreen/lehdr")

## stringi (1.4.4 -> 1.4.5) [CRAN]
## 
##   There is a binary version available but the source version is later:
##         binary source needs_compilation
## stringi  1.4.4  1.4.5              TRUE
## 
##   
  
  
   checking for file 'C:\Users\elmue\AppData\Local\Temp\RtmpSaJWbs\remotes490828207bac\jamgreen-lehdr-a350639/DESCRIPTION' ...
  
v  checking for file 'C:\Users\elmue\AppData\Local\Temp\RtmpSaJWbs\remotes490828207bac\jamgreen-lehdr-a350639/DESCRIPTION' (366ms)
## 
  
  
  
-  preparing 'lehdr':
##    checking DESCRIPTION meta-information ...
  
   checking DESCRIPTION meta-information ... 
  
v  checking DESCRIPTION meta-information
## 
  
  
  
-  checking for LF line-endings in source and make files and shell scripts
## 
  
-  checking for empty or unneeded directories
## 
  
  
  
-  building 'lehdr_0.2.3.tar.gz'
## 
  
   
##

library(lehdr)

Usage

This first example pulls the Oregon (state = "or") 2014 (year = 2014), origin-destination (lodes_type = "od"), all jobs including private primary, secondary, and Federal (job_type = "JT01"), all jobs across ages, earnings, and industry (segment = "S000"), aggregated at the Census Tract level rather than the default Census Block (agg_geo = "tract").

or_od <- grab_lodes(state = "or", year = 2014, lodes_type = "od", job_type = "JT01", 
           segment = "S000", state_part = "main", agg_geo = "tract")

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/or_od_main_JT01_2014.csv.gz
## Reading now...

head(or_od)

## # A tibble: 6 x 14
##    year state w_tract h_tract  S000  SA01  SA02  SA03  SE01  SE02  SE03  SI01
##   <dbl> <chr> <chr>   <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  2014 OR    410019~ 410019~    89     6    37    46    35    30    24    21
## 2  2014 OR    410019~ 410019~    35     2    25     8     6    13    16     1
## 3  2014 OR    410019~ 410019~    23     6    12     5     5    12     6     7
## 4  2014 OR    410019~ 410019~    20     0    17     3     4     4    12     4
## 5  2014 OR    410019~ 410019~    24     8    10     6     3    12     9     6
## 6  2014 OR    410019~ 410019~    10     3     5     2     4     5     1     2
## # ... with 2 more variables: SI02 <dbl>, SI03 <dbl>

The package can be used to retrieve multiple states and years at the same time by creating a vector or list. This second example pulls the Oregon AND Rhode Island (state = c("or", "ri")) for 2013 and 2014 (year = c(2013, 2014) or year = 2013:2014).

or_ri_od <- grab_lodes(state = c("or", "ri"), year = c(2013, 2014), lodes_type = "od", job_type = "JT01", 
           segment = "S000", state_part = "main", agg_geo = "tract")

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/or_od_main_JT01_2013.csv.gz
## Reading now...

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/ri_od_main_JT01_2013.csv.gz
## Reading now...

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/or_od_main_JT01_2014.csv.gz
## Reading now...

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/ri_od_main_JT01_2014.csv.gz
## Reading now...

head(or_ri_od)

## # A tibble: 6 x 14
##   year  state w_tract h_tract  S000  SA01  SA02  SA03  SE01  SE02  SE03  SI01
##   <chr> <chr> <chr>   <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2013  OR    410019~ 410019~    80     4    39    37    25    30    25    25
## 2 2013  OR    410019~ 410019~    27     4    15     8     8    11     8     3
## 3 2013  OR    410019~ 410019~    11     1     5     5     6     3     2     1
## 4 2013  OR    410019~ 410019~    21     3    14     4     5     8     8     4
## 5 2013  OR    410019~ 410019~    27    12     8     7     4    15     8     8
## 6 2013  OR    410019~ 410019~     4     1     2     1     2     0     2     0
## # ... with 2 more variables: SI02 <dbl>, SI03 <dbl>

Not all years are available for each state. To see all options for lodes_type, job_type, and segment and the availability for each state/year, please see the most recent LEHD Technical Document at https://lehd.ces.census.gov/data/lodes/LODES7/.

Other common uses might include retrieving Residential or Work Area Characteristics (lodes_type = "rac" or lodes_type = "wac" respectively), low income jobs (segment = "SE01") or good producing jobs (segment = "SI01"). Other common geographies might include retrieving data at the Census Block level (agg_geo = "block", not necessary as it is default) – but see below for other aggregation levels.

Additional Examples

Using County level signifiers

The following examples loads work area characteristics (wac), then uses the work area geoid w_geocode to create a variable that is just the county w_county_fips. Similar transformations can be made on residence area characteristics (rac) by using the h_geocode variable. Both variables are available in origin-destination (od) datasets and with od, one would need to set a h_county_fips and on w_county_fips.

md_wac <- grab_lodes(state = "md", year = 2015, lodes_type = "wac", job_type = "JT01", segment = "S000")

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_wac_S000_JT01_2015.csv.gz
## Reading now...

head(md_wac)

## # A tibble: 6 x 55
##   w_geocode  C000  CA01  CA02  CA03  CE01  CE02  CE03 CNS01 CNS02 CNS03 CNS04
##   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 24001000~     7     3     3     1     4     3     0     0     0     0     0
## 2 24001000~     1     0     1     0     0     1     0     0     0     0     0
## 3 24001000~    10     2     3     5     7     3     0     0     0     0     0
## 4 24001000~     2     0     2     0     0     1     1     0     0     0     0
## 5 24001000~     8     4     4     0     7     1     0     0     0     0     0
## 6 24001000~     2     0     2     0     0     2     0     0     0     0     0
## # ... with 43 more variables: CNS05 <dbl>, CNS06 <dbl>, CNS07 <dbl>,
## #   CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>,
## #   CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>,
## #   CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>,
## #   CR04 <dbl>, CR05 <dbl>, CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>,
## #   CD02 <dbl>, CD03 <dbl>, CD04 <dbl>, CS01 <dbl>, CS02 <dbl>, CFA01 <dbl>,
## #   CFA02 <dbl>, CFA03 <dbl>, CFA04 <dbl>, CFA05 <dbl>, CFS01 <dbl>,
## #   CFS02 <dbl>, CFS03 <dbl>, CFS04 <dbl>, CFS05 <dbl>, createdate <chr>,
## #   year <dbl>, state <chr>

md_wac_county <- md_wac %>% mutate(w_county_fips = str_sub(w_geocode, 1, 5))

head(md_wac_county)

## # A tibble: 6 x 56
##   w_geocode  C000  CA01  CA02  CA03  CE01  CE02  CE03 CNS01 CNS02 CNS03 CNS04
##   <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 24001000~     7     3     3     1     4     3     0     0     0     0     0
## 2 24001000~     1     0     1     0     0     1     0     0     0     0     0
## 3 24001000~    10     2     3     5     7     3     0     0     0     0     0
## 4 24001000~     2     0     2     0     0     1     1     0     0     0     0
## 5 24001000~     8     4     4     0     7     1     0     0     0     0     0
## 6 24001000~     2     0     2     0     0     2     0     0     0     0     0
## # ... with 44 more variables: CNS05 <dbl>, CNS06 <dbl>, CNS07 <dbl>,
## #   CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>,
## #   CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>,
## #   CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>,
## #   CR04 <dbl>, CR05 <dbl>, CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>,
## #   CD02 <dbl>, CD03 <dbl>, CD04 <dbl>, CS01 <dbl>, CS02 <dbl>, CFA01 <dbl>,
## #   CFA02 <dbl>, CFA03 <dbl>, CFA04 <dbl>, CFA05 <dbl>, CFS01 <dbl>,
## #   CFS02 <dbl>, CFS03 <dbl>, CFS04 <dbl>, CFS05 <dbl>, createdate <chr>,
## #   year <dbl>, state <chr>, w_county_fips <chr>

Two ways to aggregate at the County level

To aggregate at the county level, continuing the above example, we must also drop the original lock geoid w_geocode, group by our new variable w_county_fips and our existing variables year and createdate, then aggregate the remaining numeric variables. This method only works for wac and rac LODES types.

md_wac_county <- md_wac %>% mutate(w_county_fips = str_sub(w_geocode, 1, 5)) %>% 
  select(-"w_geocode") %>%
  group_by(w_county_fips, state, year, createdate) %>% 
  summarise_if(is.numeric, sum)

head(md_wac_county)

## # A tibble: 6 x 55
## # Groups:   w_county_fips, state, year [6]
##   w_county_fips state  year createdate   C000  CA01   CA02  CA03  CE01   CE02
##   <chr>         <chr> <dbl> <chr>       <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>
## 1 24001         MD     2015 20190826    25887  5653  13801  6433  5938  11081
## 2 24003         MD     2015 20190826   237881 57186 125917 54778 44280  72498
## 3 24005         MD     2015 20190826   345823 82049 180737 83037 68082 112227
## 4 24009         MD     2015 20190826    20063  5119  10381  4563  4539   6775
## 5 24011         MD     2015 20190826     7939  1567   4179  2193  1346   3293
## 6 24013         MD     2015 20190826    52618 13124  26078 13416 11985  18843
## # ... with 45 more variables: CE03 <dbl>, CNS01 <dbl>, CNS02 <dbl>,
## #   CNS03 <dbl>, CNS04 <dbl>, CNS05 <dbl>, CNS06 <dbl>, CNS07 <dbl>,
## #   CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>,
## #   CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>,
## #   CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>,
## #   CR04 <dbl>, CR05 <dbl>, CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>,
## #   CD02 <dbl>, CD03 <dbl>, CD04 <dbl>, CS01 <dbl>, CS02 <dbl>, CFA01 <dbl>,
## #   CFA02 <dbl>, CFA03 <dbl>, CFA04 <dbl>, CFA05 <dbl>, CFS01 <dbl>,
## #   CFS02 <dbl>, CFS03 <dbl>, CFS04 <dbl>, CFS05 <dbl>

Alternatively, this functionality is also built-in to the package and advisable for origin-destination grabs (see below). Here, we aggregate rac data and include an argument to aggregate at the County level (agg_geo = "county"):

md_rac_county <- grab_lodes(state = "md", year = 2015, lodes_type = "rac", job_type = "JT01", 
           segment = "S000", agg_geo = "county")

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_rac_S000_JT01_2015.csv.gz
## Reading now...

head(md_rac_county)

## # A tibble: 6 x 44
##    year state h_county   C000  CA01   CA02  CA03  CE01   CE02   CE03 CNS01 CNS02
##   <dbl> <chr> <chr>     <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>
## 1  2015 MD    24001     24683  5488  13029  6166  5598  10294   8791    51   140
## 2  2015 MD    24003    239454 51186 131373 56895 38476  63599 137379   214    84
## 3  2015 MD    24005    371927 79756 198239 93932 62678 112647 196602   426    99
## 4  2015 MD    24009     32868  7565  17830  7473  5643   9029  18196    48    11
## 5  2015 MD    24011     14935  3285   7827  3823  2800   5829   6306   238    13
## 6  2015 MD    24013     80257 17028  42594 20635 13427  21671  45159   347    31
## # ... with 32 more variables: CNS03 <dbl>, CNS04 <dbl>, CNS05 <dbl>,
## #   CNS06 <dbl>, CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>,
## #   CNS11 <dbl>, CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>,
## #   CNS16 <dbl>, CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>,
## #   CR01 <dbl>, CR02 <dbl>, CR03 <dbl>, CR04 <dbl>, CR05 <dbl>, CR07 <dbl>,
## #   CT01 <dbl>, CT02 <dbl>, CD01 <dbl>, CD02 <dbl>, CD03 <dbl>, CD04 <dbl>,
## #   CS01 <dbl>, CS02 <dbl>

Aggregating Origin-Destination

As mentioned above, aggregating origin-destination is built-in. This takes care of aggregation on both the h_geocode and w_geocode variables:

md_od_county <- grab_lodes(state = "md", year = 2015, lodes_type = "od", job_type = "JT01", 
           segment = "S000", agg_geo = "county", state_part = "main")

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_od_main_JT01_2015.csv.gz
## Reading now...

head(md_od_county)

## # A tibble: 6 x 14
##    year state w_county h_county  S000  SA01  SA02  SA03  SE01  SE02  SE03  SI01
##   <dbl> <chr> <chr>    <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  2015 MD    24001    24001    16347  3533  8570  4244  3911  7249  5187  1908
## 2  2015 MD    24001    24003      171    48    78    45    53    51    67    10
## 3  2015 MD    24001    24005      272    68   140    64    73    92   107     7
## 4  2015 MD    24001    24009       29    13     8     8    15     7     7     1
## 5  2015 MD    24001    24011        9     1     7     1     2     3     4     0
## 6  2015 MD    24001    24013       74    22    41    11    24    22    28     2
## # ... with 2 more variables: SI02 <dbl>, SI03 <dbl>

Aggregating at Block Group, Tract, or State level

Similarly, built-in functions exist to group at Block Group, Tract, County, and State levels. County was demonstrated above. All require setting the agg_geo argument. This aggregation works for all three LODES types, including origin-destination.

md_rac_bg <- grab_lodes(state = "md", year = 2015, lodes_type = "rac", job_type = "JT01", 
           segment = "S000", agg_geo = "bg")

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_rac_S000_JT01_2015.csv.gz
## Reading now...

head(md_rac_bg)

## # A tibble: 6 x 44
##    year state h_bg    C000  CA01  CA02  CA03  CE01  CE02  CE03 CNS01 CNS02 CNS03
##   <dbl> <chr> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  2015 MD    24001~   251    56   126    69    51   103    97     2     2     0
## 2  2015 MD    24001~   452    90   242   120    91   183   178     3     6     2
## 3  2015 MD    24001~   341    71   191    79    74   150   117     1     4     1
## 4  2015 MD    24001~   294    51   157    86    52   124   118     1     2     1
## 5  2015 MD    24001~   352    66   196    90    78   134   140     1     0     0
## 6  2015 MD    24001~   550   119   286   145   105   244   201     1     2     0
## # ... with 31 more variables: CNS04 <dbl>, CNS05 <dbl>, CNS06 <dbl>,
## #   CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>, CNS11 <dbl>,
## #   CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>, CNS16 <dbl>,
## #   CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>,
## #   CR03 <dbl>, CR04 <dbl>, CR05 <dbl>, CR07 <dbl>, CT01 <dbl>, CT02 <dbl>,
## #   CD01 <dbl>, CD02 <dbl>, CD03 <dbl>, CD04 <dbl>, CS01 <dbl>, CS02 <dbl>

md_rac_tract <- grab_lodes(state = "md", year = 2015, lodes_type = "rac", job_type = "JT01", 
           segment = "S000", agg_geo = "tract")

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_rac_S000_JT01_2015.csv.gz
## Reading now...

head(md_rac_tract)

## # A tibble: 6 x 44
##    year state h_tract  C000  CA01  CA02  CA03  CE01  CE02  CE03 CNS01 CNS02
##   <dbl> <chr> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  2015 MD    240010~  1044   217   559   268   216   436   392     6    12
## 2  2015 MD    240010~  1196   236   639   321   235   502   459     3     4
## 3  2015 MD    240010~   945   206   493   246   225   395   325     3     1
## 4  2015 MD    240010~  1134   228   598   308   282   466   386     0     5
## 5  2015 MD    240010~   746   159   371   216   184   337   225     1     2
## 6  2015 MD    240010~  1091   265   573   253   261   459   371     1     2
## # ... with 32 more variables: CNS03 <dbl>, CNS04 <dbl>, CNS05 <dbl>,
## #   CNS06 <dbl>, CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>,
## #   CNS11 <dbl>, CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>,
## #   CNS16 <dbl>, CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>,
## #   CR01 <dbl>, CR02 <dbl>, CR03 <dbl>, CR04 <dbl>, CR05 <dbl>, CR07 <dbl>,
## #   CT01 <dbl>, CT02 <dbl>, CD01 <dbl>, CD02 <dbl>, CD03 <dbl>, CD04 <dbl>,
## #   CS01 <dbl>, CS02 <dbl>

md_rac_state <- grab_lodes(state = "md", year = 2015, lodes_type = "rac", job_type = "JT01", 
           segment = "S000", agg_geo = "state")

## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_rac_S000_JT01_2015.csv.gz
## Reading now...

head(md_rac_state)

## # A tibble: 1 x 44
##    year state h_state   C000   CA01   CA02   CA03   CE01   CE02   CE03 CNS01
##   <dbl> <chr> <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>
## 1  2015 MD    24      2.53e6 536277 1.39e6 610456 418694 748393 1.37e6  4819
## # ... with 33 more variables: CNS02 <dbl>, CNS03 <dbl>, CNS04 <dbl>,
## #   CNS05 <dbl>, CNS06 <dbl>, CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>,
## #   CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>,
## #   CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>,
## #   CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>, CR04 <dbl>, CR05 <dbl>,
## #   CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>, CD02 <dbl>, CD03 <dbl>,
## #   CD04 <dbl>, CS01 <dbl>, CS02 <dbl>

Getting Started with lehdr

Common uses of lehdr

Jamaal Green

Dillon Mahmoudi

Liming Wang

22 January 2020