3

I have some demographic data I would like to use to make a choropleth map of US counties. My workflow doesn't run into any errors and I'm able to create the final map, however, the data that its mapping is incorrect. My workflow makes use of two data sources -a shape file and a data.frame. The shapefile is a counties shapefile that can be found at this link https://www.dropbox.com/s/4ujxidyx42793j7/cb_2015_us_county_500k.zip?dl=1 The data.frame file can be found at this link: https://www.dropbox.com/s/qys6s6ikrs1g2xb/data.dem.csv?dl=1

Here is my code:

#Load dependencies
library(sp)
library(spatialEco)
library(rgdal)
library(dplyr)
library(maptools)
library(taRifx.geo)
library(ggplot2)
library(USAboundaries)
library(splitstackshape)
library(maps)
library(cowplot)

#Read in shape and csv files
county.track<-readOGR("/path", "filename")
county.track@data$id = rownames(county.track@data)
data<-read.csv("/path/filename.csv")

#Convert data.frame (data) to points polygon file
data$y<-data$lat
data$x<-data$long
coordinates(data) <- ~ x + y
proj4string(data) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
proj4string(county.track) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")

#Overlay points onto polygons
county.track.data<-point.in.poly(data, county.track)

#Summarize point data by county
count<-select(as.data.frame(county.track.data), id, count)
count<-count %>%
  group_by(id) %>%
  summarize(count=sum(count))

#Merge with shape file data
county.track@data<-merge(county.track@data, count, by="id", all.x=T)

#Replace NA values with zeroes 
county.track@data$count[is.na(county.track@data$count)]<-0
county.track.points = fortify(county.track, region="id")
map.plot<-merge(county.track.points, county.track@data, by="id")

#Get rid of Hawaii and Alaska
map.plot<-map.plot %>%
  filter(lat<50 & lat>25) %>%
  filter(long>-130)

#Create choropleth map using ggplot2
 ggplot(map.plot) +
  geom_polygon(aes(long, lat, group=group, fill=log(count))) +
  coord_map()

The output looks like the following: enter image description here

But this is just wrong, which is apparent for a number of reasons. One, most obviously much of the data is not mapped. The grey areas on the map signify NA. But I removed the NAs in one of the steps above, also when examining the data used to map (map.plot), there are no NAs in the fill variable (count). Second, the distribution of values for what is mapped is off. Los Angeles county should have the highest count value at 793 (log value of 6.675823), yet on the map numerous lighter colored counties indicate the value of other spatial units are higher and some of the top ranking counties such as San Diego, are not filled in at all (bottom left of map).

When I examine the data I used to map (map.plot), everything seems A OK. Los Angeles county is still the highest valued county for the "count" variable, yet the map suggests otherwise (see this image here). enter image description here I'm hoping some one can do some forensics here and identify the problem, I've done my best to go through all my steps but I can't seem to identify the issue. Thanks in advance.

UPDATE: I tried using a different shapefile from the same source. The shapefile in the link above is the same as the the one labeled "cb_2015_us_county_500k.zip" at the following (https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html). When I choose a different shapefile (such as cb_2015_us_county_5m.zip) I get a different map but same problems: See the following map an example:

enter image description here

I'm not sure what is going on! In this new map, LA county is no longer even colored in but Orange County is! Any help is much appreciated.

1 Answer 1

3

Not rly sure what's going on with your merging, but this worked for me:

library(albersusa) # devtools::install_github("hrbrmstr/albersusa)
library(readr)
library(dplyr)
library(rgeos)
library(maptools)
library(ggplot2)
library(ggalt)
library(ggthemes)
library(viridis)

df <- read_csv("data.dem.csv")

counties_composite() %>% 
  subset(state %in% unique(df$state)) -> usa

pts <- df[,2:1]
coordinates(pts) <- ~long+lat
proj4string(pts) <- CRS(proj4string(usa))

bind_cols(df, select(over(pts, usa), -state)) %>% 
  count(fips, wt=count) -> df

You have 942 total counties:

glimpse(df)
## Observations: 942
## Variables: 2
## $ fips <chr> "01001", "01003", "01013", "01015", "01043", "01055", "01061", ...
## $ n    <int> 1, 2, 1, 3, 1, 3, 1, 1, 19, 6, 12, 7, 7, 1, 4, 4, 1, 5, 67, 19,...

There are over 3K counties in the U.S.

However, there aren't many NAs:

filter(df, is.na(fips))
## # A tibble: 1 x 2
##    fips     n
#3   <chr> <int>
## 1  <NA>    10

usa_map <- fortify(usa, region="fips")

gg <- ggplot()
gg <- gg + geom_map(data=usa_map, map=usa_map,
                    aes(long, lat, map_id=id),
                    color="#b2b2b2", size=0.05, fill="white")
gg <- gg + geom_map(data=df, map=usa_map,
                    aes(fill=n, map_id=fips),
                    color="#b2b2b2", size=0.05)
gg <- gg + scale_fill_viridis(name="Count", trans="log10")
gg <- gg + coord_proj(us_aeqd_proj)
gg <- gg + theme_map()
gg <- gg + theme(legend.position=c(0.85, 0.2))
gg

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the response, I'm having trouble replicating your code counties_composite() %>% subset(state %in% unique(df$state)) -> usa I get the following error: Error in match(x, table, nomatch = 0L) : object 'state' not found
when I run this instead counties_composite() %>% subset(df$state %in% unique(df$state)) -> usa then this line gives me an error: coordinates(pts) <- ~long+lat Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘coordinates<-’ for signature ‘"tbl_df"’
Got it going with pts<-as.data.frame(its)
One more thing, I went back to the original csv file and added in data for Alaska and Hawaii. When I run through the code it all plots exactly like before despite the addition of the new data. The FIPS code for Alaska is 02 and none of the FIPS in the df$fips begin with 02. Here's a link to the new data with Hawaii and Alaska: dropbox.com/s/0arazi2n0adivzc/data.dem2.csv?dl=0 Thanks so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.