getting_started.Rmd
This vignette is a brief introduction to the package including its installation and making some basic queries.
lehdr is an R package that allows users to draw Longitudinal and Employer Household Dynamics Origin-Destination Employment Statistics (LODES) datasets returned as dataframes. The LODES dataset forms the backbone of the US Census’s OntheMap web app that allows users to track changing spatial employment patterns at a fine geographic scale. While OnTheMap is useful, it is a limited tool that does not easily allow comparisons over time or across geographies. This package exists to make querying the tables that form the OnTheMap easier for urban researchers and practitioners, such as transportation and economic development planners and disaster preparedness professionals.
lehdr has not yet been submitted to CRAN so installing using devtools is required. Additionally, we’ll be using dplyr.
#install.packages(setdiff(c("ggplot2", "stringr", "dplyr", "devtools"), rownames(installed.packages())),
# repos="http://cran.rstudio.com")
library(dplyr)
library(stringr)
library(devtools)
devtools::install_github("jamgreen/lehdr")
## stringi (1.4.4 -> 1.4.5) [CRAN]
##
## There is a binary version available but the source version is later:
## binary source needs_compilation
## stringi 1.4.4 1.4.5 TRUE
##
##
checking for file 'C:\Users\elmue\AppData\Local\Temp\RtmpSaJWbs\remotes490828207bac\jamgreen-lehdr-a350639/DESCRIPTION' ...
v checking for file 'C:\Users\elmue\AppData\Local\Temp\RtmpSaJWbs\remotes490828207bac\jamgreen-lehdr-a350639/DESCRIPTION' (366ms)
##
- preparing 'lehdr':
## checking DESCRIPTION meta-information ...
checking DESCRIPTION meta-information ...
v checking DESCRIPTION meta-information
##
- checking for LF line-endings in source and make files and shell scripts
##
- checking for empty or unneeded directories
##
- building 'lehdr_0.2.3.tar.gz'
##
##
This first example pulls the Oregon (state = "or"
) 2014 (year = 2014
), origin-destination (lodes_type = "od"
), all jobs including private primary, secondary, and Federal (job_type = "JT01"
), all jobs across ages, earnings, and industry (segment = "S000"
), aggregated at the Census Tract level rather than the default Census Block (agg_geo = "tract"
).
or_od <- grab_lodes(state = "or", year = 2014, lodes_type = "od", job_type = "JT01",
segment = "S000", state_part = "main", agg_geo = "tract")
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/or_od_main_JT01_2014.csv.gz
## Reading now...
## # A tibble: 6 x 14
## year state w_tract h_tract S000 SA01 SA02 SA03 SE01 SE02 SE03 SI01
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014 OR 410019~ 410019~ 89 6 37 46 35 30 24 21
## 2 2014 OR 410019~ 410019~ 35 2 25 8 6 13 16 1
## 3 2014 OR 410019~ 410019~ 23 6 12 5 5 12 6 7
## 4 2014 OR 410019~ 410019~ 20 0 17 3 4 4 12 4
## 5 2014 OR 410019~ 410019~ 24 8 10 6 3 12 9 6
## 6 2014 OR 410019~ 410019~ 10 3 5 2 4 5 1 2
## # ... with 2 more variables: SI02 <dbl>, SI03 <dbl>
The package can be used to retrieve multiple states and years at the same time by creating a vector or list. This second example pulls the Oregon AND Rhode Island (state = c("or", "ri")
) for 2013 and 2014 (year = c(2013, 2014)
or year = 2013:2014
).
or_ri_od <- grab_lodes(state = c("or", "ri"), year = c(2013, 2014), lodes_type = "od", job_type = "JT01",
segment = "S000", state_part = "main", agg_geo = "tract")
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/or_od_main_JT01_2013.csv.gz
## Reading now...
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/ri_od_main_JT01_2013.csv.gz
## Reading now...
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/or_od_main_JT01_2014.csv.gz
## Reading now...
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/ri_od_main_JT01_2014.csv.gz
## Reading now...
## # A tibble: 6 x 14
## year state w_tract h_tract S000 SA01 SA02 SA03 SE01 SE02 SE03 SI01
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2013 OR 410019~ 410019~ 80 4 39 37 25 30 25 25
## 2 2013 OR 410019~ 410019~ 27 4 15 8 8 11 8 3
## 3 2013 OR 410019~ 410019~ 11 1 5 5 6 3 2 1
## 4 2013 OR 410019~ 410019~ 21 3 14 4 5 8 8 4
## 5 2013 OR 410019~ 410019~ 27 12 8 7 4 15 8 8
## 6 2013 OR 410019~ 410019~ 4 1 2 1 2 0 2 0
## # ... with 2 more variables: SI02 <dbl>, SI03 <dbl>
Not all years are available for each state. To see all options for lodes_type
, job_type
, and segment
and the availability for each state/year, please see the most recent LEHD Technical Document at https://lehd.ces.census.gov/data/lodes/LODES7/.
Other common uses might include retrieving Residential or Work Area Characteristics (lodes_type = "rac"
or lodes_type = "wac"
respectively), low income jobs (segment = "SE01"
) or good producing jobs (segment = "SI01"
). Other common geographies might include retrieving data at the Census Block level (agg_geo = "block"
, not necessary as it is default) – but see below for other aggregation levels.
The following examples loads work area characteristics (wac), then uses the work area geoid w_geocode
to create a variable that is just the county w_county_fips
. Similar transformations can be made on residence area characteristics (rac) by using the h_geocode
variable. Both variables are available in origin-destination (od) datasets and with od, one would need to set a h_county_fips
and on w_county_fips
.
md_wac <- grab_lodes(state = "md", year = 2015, lodes_type = "wac", job_type = "JT01", segment = "S000")
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_wac_S000_JT01_2015.csv.gz
## Reading now...
## # A tibble: 6 x 55
## w_geocode C000 CA01 CA02 CA03 CE01 CE02 CE03 CNS01 CNS02 CNS03 CNS04
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 24001000~ 7 3 3 1 4 3 0 0 0 0 0
## 2 24001000~ 1 0 1 0 0 1 0 0 0 0 0
## 3 24001000~ 10 2 3 5 7 3 0 0 0 0 0
## 4 24001000~ 2 0 2 0 0 1 1 0 0 0 0
## 5 24001000~ 8 4 4 0 7 1 0 0 0 0 0
## 6 24001000~ 2 0 2 0 0 2 0 0 0 0 0
## # ... with 43 more variables: CNS05 <dbl>, CNS06 <dbl>, CNS07 <dbl>,
## # CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>,
## # CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>,
## # CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>,
## # CR04 <dbl>, CR05 <dbl>, CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>,
## # CD02 <dbl>, CD03 <dbl>, CD04 <dbl>, CS01 <dbl>, CS02 <dbl>, CFA01 <dbl>,
## # CFA02 <dbl>, CFA03 <dbl>, CFA04 <dbl>, CFA05 <dbl>, CFS01 <dbl>,
## # CFS02 <dbl>, CFS03 <dbl>, CFS04 <dbl>, CFS05 <dbl>, createdate <chr>,
## # year <dbl>, state <chr>
## # A tibble: 6 x 56
## w_geocode C000 CA01 CA02 CA03 CE01 CE02 CE03 CNS01 CNS02 CNS03 CNS04
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 24001000~ 7 3 3 1 4 3 0 0 0 0 0
## 2 24001000~ 1 0 1 0 0 1 0 0 0 0 0
## 3 24001000~ 10 2 3 5 7 3 0 0 0 0 0
## 4 24001000~ 2 0 2 0 0 1 1 0 0 0 0
## 5 24001000~ 8 4 4 0 7 1 0 0 0 0 0
## 6 24001000~ 2 0 2 0 0 2 0 0 0 0 0
## # ... with 44 more variables: CNS05 <dbl>, CNS06 <dbl>, CNS07 <dbl>,
## # CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>,
## # CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>,
## # CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>,
## # CR04 <dbl>, CR05 <dbl>, CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>,
## # CD02 <dbl>, CD03 <dbl>, CD04 <dbl>, CS01 <dbl>, CS02 <dbl>, CFA01 <dbl>,
## # CFA02 <dbl>, CFA03 <dbl>, CFA04 <dbl>, CFA05 <dbl>, CFS01 <dbl>,
## # CFS02 <dbl>, CFS03 <dbl>, CFS04 <dbl>, CFS05 <dbl>, createdate <chr>,
## # year <dbl>, state <chr>, w_county_fips <chr>
To aggregate at the county level, continuing the above example, we must also drop the original lock geoid w_geocode
, group by our new variable w_county_fips
and our existing variables year
and createdate
, then aggregate the remaining numeric variables. This method only works for wac and rac LODES types.
md_wac_county <- md_wac %>% mutate(w_county_fips = str_sub(w_geocode, 1, 5)) %>%
select(-"w_geocode") %>%
group_by(w_county_fips, state, year, createdate) %>%
summarise_if(is.numeric, sum)
head(md_wac_county)
## # A tibble: 6 x 55
## # Groups: w_county_fips, state, year [6]
## w_county_fips state year createdate C000 CA01 CA02 CA03 CE01 CE02
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 24001 MD 2015 20190826 25887 5653 13801 6433 5938 11081
## 2 24003 MD 2015 20190826 237881 57186 125917 54778 44280 72498
## 3 24005 MD 2015 20190826 345823 82049 180737 83037 68082 112227
## 4 24009 MD 2015 20190826 20063 5119 10381 4563 4539 6775
## 5 24011 MD 2015 20190826 7939 1567 4179 2193 1346 3293
## 6 24013 MD 2015 20190826 52618 13124 26078 13416 11985 18843
## # ... with 45 more variables: CE03 <dbl>, CNS01 <dbl>, CNS02 <dbl>,
## # CNS03 <dbl>, CNS04 <dbl>, CNS05 <dbl>, CNS06 <dbl>, CNS07 <dbl>,
## # CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>,
## # CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>,
## # CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>,
## # CR04 <dbl>, CR05 <dbl>, CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>,
## # CD02 <dbl>, CD03 <dbl>, CD04 <dbl>, CS01 <dbl>, CS02 <dbl>, CFA01 <dbl>,
## # CFA02 <dbl>, CFA03 <dbl>, CFA04 <dbl>, CFA05 <dbl>, CFS01 <dbl>,
## # CFS02 <dbl>, CFS03 <dbl>, CFS04 <dbl>, CFS05 <dbl>
Alternatively, this functionality is also built-in to the package and advisable for origin-destination grabs (see below). Here, we aggregate rac data and include an argument to aggregate at the County level (agg_geo = "county"
):
md_rac_county <- grab_lodes(state = "md", year = 2015, lodes_type = "rac", job_type = "JT01",
segment = "S000", agg_geo = "county")
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_rac_S000_JT01_2015.csv.gz
## Reading now...
## # A tibble: 6 x 44
## year state h_county C000 CA01 CA02 CA03 CE01 CE02 CE03 CNS01 CNS02
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2015 MD 24001 24683 5488 13029 6166 5598 10294 8791 51 140
## 2 2015 MD 24003 239454 51186 131373 56895 38476 63599 137379 214 84
## 3 2015 MD 24005 371927 79756 198239 93932 62678 112647 196602 426 99
## 4 2015 MD 24009 32868 7565 17830 7473 5643 9029 18196 48 11
## 5 2015 MD 24011 14935 3285 7827 3823 2800 5829 6306 238 13
## 6 2015 MD 24013 80257 17028 42594 20635 13427 21671 45159 347 31
## # ... with 32 more variables: CNS03 <dbl>, CNS04 <dbl>, CNS05 <dbl>,
## # CNS06 <dbl>, CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>,
## # CNS11 <dbl>, CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>,
## # CNS16 <dbl>, CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>,
## # CR01 <dbl>, CR02 <dbl>, CR03 <dbl>, CR04 <dbl>, CR05 <dbl>, CR07 <dbl>,
## # CT01 <dbl>, CT02 <dbl>, CD01 <dbl>, CD02 <dbl>, CD03 <dbl>, CD04 <dbl>,
## # CS01 <dbl>, CS02 <dbl>
As mentioned above, aggregating origin-destination is built-in. This takes care of aggregation on both the h_geocode
and w_geocode
variables:
md_od_county <- grab_lodes(state = "md", year = 2015, lodes_type = "od", job_type = "JT01",
segment = "S000", agg_geo = "county", state_part = "main")
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_od_main_JT01_2015.csv.gz
## Reading now...
## # A tibble: 6 x 14
## year state w_county h_county S000 SA01 SA02 SA03 SE01 SE02 SE03 SI01
## <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2015 MD 24001 24001 16347 3533 8570 4244 3911 7249 5187 1908
## 2 2015 MD 24001 24003 171 48 78 45 53 51 67 10
## 3 2015 MD 24001 24005 272 68 140 64 73 92 107 7
## 4 2015 MD 24001 24009 29 13 8 8 15 7 7 1
## 5 2015 MD 24001 24011 9 1 7 1 2 3 4 0
## 6 2015 MD 24001 24013 74 22 41 11 24 22 28 2
## # ... with 2 more variables: SI02 <dbl>, SI03 <dbl>
Similarly, built-in functions exist to group at Block Group, Tract, County, and State levels. County was demonstrated above. All require setting the agg_geo
argument. This aggregation works for all three LODES types, including origin-destination.
md_rac_bg <- grab_lodes(state = "md", year = 2015, lodes_type = "rac", job_type = "JT01",
segment = "S000", agg_geo = "bg")
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_rac_S000_JT01_2015.csv.gz
## Reading now...
## # A tibble: 6 x 44
## year state h_bg C000 CA01 CA02 CA03 CE01 CE02 CE03 CNS01 CNS02 CNS03
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2015 MD 24001~ 251 56 126 69 51 103 97 2 2 0
## 2 2015 MD 24001~ 452 90 242 120 91 183 178 3 6 2
## 3 2015 MD 24001~ 341 71 191 79 74 150 117 1 4 1
## 4 2015 MD 24001~ 294 51 157 86 52 124 118 1 2 1
## 5 2015 MD 24001~ 352 66 196 90 78 134 140 1 0 0
## 6 2015 MD 24001~ 550 119 286 145 105 244 201 1 2 0
## # ... with 31 more variables: CNS04 <dbl>, CNS05 <dbl>, CNS06 <dbl>,
## # CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>, CNS11 <dbl>,
## # CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>, CNS16 <dbl>,
## # CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>,
## # CR03 <dbl>, CR04 <dbl>, CR05 <dbl>, CR07 <dbl>, CT01 <dbl>, CT02 <dbl>,
## # CD01 <dbl>, CD02 <dbl>, CD03 <dbl>, CD04 <dbl>, CS01 <dbl>, CS02 <dbl>
md_rac_tract <- grab_lodes(state = "md", year = 2015, lodes_type = "rac", job_type = "JT01",
segment = "S000", agg_geo = "tract")
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_rac_S000_JT01_2015.csv.gz
## Reading now...
## # A tibble: 6 x 44
## year state h_tract C000 CA01 CA02 CA03 CE01 CE02 CE03 CNS01 CNS02
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2015 MD 240010~ 1044 217 559 268 216 436 392 6 12
## 2 2015 MD 240010~ 1196 236 639 321 235 502 459 3 4
## 3 2015 MD 240010~ 945 206 493 246 225 395 325 3 1
## 4 2015 MD 240010~ 1134 228 598 308 282 466 386 0 5
## 5 2015 MD 240010~ 746 159 371 216 184 337 225 1 2
## 6 2015 MD 240010~ 1091 265 573 253 261 459 371 1 2
## # ... with 32 more variables: CNS03 <dbl>, CNS04 <dbl>, CNS05 <dbl>,
## # CNS06 <dbl>, CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>, CNS10 <dbl>,
## # CNS11 <dbl>, CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>, CNS15 <dbl>,
## # CNS16 <dbl>, CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>, CNS20 <dbl>,
## # CR01 <dbl>, CR02 <dbl>, CR03 <dbl>, CR04 <dbl>, CR05 <dbl>, CR07 <dbl>,
## # CT01 <dbl>, CT02 <dbl>, CD01 <dbl>, CD02 <dbl>, CD03 <dbl>, CD04 <dbl>,
## # CS01 <dbl>, CS02 <dbl>
md_rac_state <- grab_lodes(state = "md", year = 2015, lodes_type = "rac", job_type = "JT01",
segment = "S000", agg_geo = "state")
## Cached version of file found in C:/Users/elmue/Documents/package_dev/lehdr/vignettes/lodes_raw/md_rac_S000_JT01_2015.csv.gz
## Reading now...
## # A tibble: 1 x 44
## year state h_state C000 CA01 CA02 CA03 CE01 CE02 CE03 CNS01
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2015 MD 24 2.53e6 536277 1.39e6 610456 418694 748393 1.37e6 4819
## # ... with 33 more variables: CNS02 <dbl>, CNS03 <dbl>, CNS04 <dbl>,
## # CNS05 <dbl>, CNS06 <dbl>, CNS07 <dbl>, CNS08 <dbl>, CNS09 <dbl>,
## # CNS10 <dbl>, CNS11 <dbl>, CNS12 <dbl>, CNS13 <dbl>, CNS14 <dbl>,
## # CNS15 <dbl>, CNS16 <dbl>, CNS17 <dbl>, CNS18 <dbl>, CNS19 <dbl>,
## # CNS20 <dbl>, CR01 <dbl>, CR02 <dbl>, CR03 <dbl>, CR04 <dbl>, CR05 <dbl>,
## # CR07 <dbl>, CT01 <dbl>, CT02 <dbl>, CD01 <dbl>, CD02 <dbl>, CD03 <dbl>,
## # CD04 <dbl>, CS01 <dbl>, CS02 <dbl>