Unable to construct choropleth map in r

Question

I have some demographic data I would like to use to make a choropleth map of US counties. My workflow doesn't run into any errors and I'm able to create the final map, however, the data that its mapping is incorrect. My workflow makes use of two data sources -a shape file and a data.frame. The shapefile is a counties shapefile that can be found at this link https://www.dropbox.com/s/4ujxidyx42793j7/cb_2015_us_county_500k.zip?dl=1 The data.frame file can be found at this link: https://www.dropbox.com/s/qys6s6ikrs1g2xb/data.dem.csv?dl=1

Here is my code:

#Load dependencies
library(sp)
library(spatialEco)
library(rgdal)
library(dplyr)
library(maptools)
library(taRifx.geo)
library(ggplot2)
library(USAboundaries)
library(splitstackshape)
library(maps)
library(cowplot)

#Read in shape and csv files
county.track<-readOGR("/path", "filename")
county.track@data$id = rownames(county.track@data)
data<-read.csv("/path/filename.csv")

#Convert data.frame (data) to points polygon file
data$y<-data$lat
data$x<-data$long
coordinates(data) <- ~ x + y
proj4string(data) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")
proj4string(county.track) <- CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0")

#Overlay points onto polygons
county.track.data<-point.in.poly(data, county.track)

#Summarize point data by county
count<-select(as.data.frame(county.track.data), id, count)
count<-count %>%
  group_by(id) %>%
  summarize(count=sum(count))

#Merge with shape file data
county.track@data<-merge(county.track@data, count, by="id", all.x=T)

#Replace NA values with zeroes 
county.track@data$count[is.na(county.track@data$count)]<-0
county.track.points = fortify(county.track, region="id")
map.plot<-merge(county.track.points, county.track@data, by="id")

#Get rid of Hawaii and Alaska
map.plot<-map.plot %>%
  filter(lat<50 & lat>25) %>%
  filter(long>-130)

#Create choropleth map using ggplot2
 ggplot(map.plot) +
  geom_polygon(aes(long, lat, group=group, fill=log(count))) +
  coord_map()

The output looks like the following:

But this is just wrong, which is apparent for a number of reasons. One, most obviously much of the data is not mapped. The grey areas on the map signify NA. But I removed the NAs in one of the steps above, also when examining the data used to map (map.plot), there are no NAs in the fill variable (count). Second, the distribution of values for what is mapped is off. Los Angeles county should have the highest count value at 793 (log value of 6.675823), yet on the map numerous lighter colored counties indicate the value of other spatial units are higher and some of the top ranking counties such as San Diego, are not filled in at all (bottom left of map).

When I examine the data I used to map (map.plot), everything seems A OK. Los Angeles county is still the highest valued county for the "count" variable, yet the map suggests otherwise (see this image here). I'm hoping some one can do some forensics here and identify the problem, I've done my best to go through all my steps but I can't seem to identify the issue. Thanks in advance.

UPDATE: I tried using a different shapefile from the same source. The shapefile in the link above is the same as the the one labeled "cb_2015_us_county_500k.zip" at the following (https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html). When I choose a different shapefile (such as cb_2015_us_county_5m.zip) I get a different map but same problems: See the following map an example:

I'm not sure what is going on! In this new map, LA county is no longer even colored in but Orange County is! Any help is much appreciated.

hrbrmstr · Accepted Answer · 2016-07-17 03:19:23Z

3

Not rly sure what's going on with your merging, but this worked for me:

library(albersusa) # devtools::install_github("hrbrmstr/albersusa)
library(readr)
library(dplyr)
library(rgeos)
library(maptools)
library(ggplot2)
library(ggalt)
library(ggthemes)
library(viridis)

df <- read_csv("data.dem.csv")

counties_composite() %>% 
  subset(state %in% unique(df$state)) -> usa

pts <- df[,2:1]
coordinates(pts) <- ~long+lat
proj4string(pts) <- CRS(proj4string(usa))

bind_cols(df, select(over(pts, usa), -state)) %>% 
  count(fips, wt=count) -> df

You have 942 total counties:

glimpse(df)
## Observations: 942
## Variables: 2
## $ fips <chr> "01001", "01003", "01013", "01015", "01043", "01055", "01061", ...
## $ n    <int> 1, 2, 1, 3, 1, 3, 1, 1, 19, 6, 12, 7, 7, 1, 4, 4, 1, 5, 67, 19,...

There are over 3K counties in the U.S.

However, there aren't many NAs:

filter(df, is.na(fips))
## # A tibble: 1 x 2
##    fips     n
#3   <chr> <int>
## 1  <NA>    10

usa_map <- fortify(usa, region="fips")

gg <- ggplot()
gg <- gg + geom_map(data=usa_map, map=usa_map,
                    aes(long, lat, map_id=id),
                    color="#b2b2b2", size=0.05, fill="white")
gg <- gg + geom_map(data=df, map=usa_map,
                    aes(fill=n, map_id=fips),
                    color="#b2b2b2", size=0.05)
gg <- gg + scale_fill_viridis(name="Count", trans="log10")
gg <- gg + coord_proj(us_aeqd_proj)
gg <- gg + theme_map()
gg <- gg + theme(legend.position=c(0.85, 0.2))
gg

answered Jul 17, 2016 at 3:19

hrbrmstr

79.1k11 gold badges146 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Cyrus Mohammadian Over a year ago

Thanks for the response, I'm having trouble replicating your code counties_composite() %>% subset(state %in% unique(df$state)) -> usa I get the following error: Error in match(x, table, nomatch = 0L) : object 'state' not found

Cyrus Mohammadian Over a year ago

when I run this instead counties_composite() %>% subset(df$state %in% unique(df$state)) -> usa then this line gives me an error: coordinates(pts) <- ~long+lat Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘coordinates<-’ for signature ‘"tbl_df"’

Cyrus Mohammadian Over a year ago

Got it going with pts<-as.data.frame(its)

Cyrus Mohammadian Over a year ago

One more thing, I went back to the original csv file and added in data for Alaska and Hawaii. When I run through the code it all plots exactly like before despite the addition of the new data. The FIPS code for Alaska is 02 and none of the FIPS in the df$fips begin with 02. Here's a link to the new data with Hawaii and Alaska: dropbox.com/s/0arazi2n0adivzc/data.dem2.csv?dl=0 Thanks so much!

Collectives™ on Stack Overflow

Unable to construct choropleth map in r

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related