2

I have a data frame consisting of multiple data points with specific geocoordinates (latitude and longitude). I'm looking to create a choropleth-style world map where geographical regions are shaded according to how many data points fall within the boundaries of the region.

Is there a simple way to accomplish what I'm trying to do in R, preferably using the "maps" package's world map and the "ggplot2" map-plotting functions?

Here is a minimally reproducible result of what I have:

library(ggplot2)
library(maps)

data <- data.frame(lat = 40.730610, lon = -73.935242)

ggplot() + 
  geom_polygon(data = map_data("world"), aes(x = long, y = lat, group = group, fill = group)) +
  coord_fixed(1.3)

I've noticed that the fill parameter on plot item functions can be used to create a choropleth effect. Here, the fill parameter on the aes() function of the geom_polygon() function is used to create a choropleth where each group is color coded differently.

1 Answer 1

5

There are many ways to achieve this task. The general idea is to convert both the point data and polygon data to spatial objects. After that, count how many points fall within that polygon. I know we can do this using the sp package, which is widespread and well-known in the R community, but I decided to use the sf package because sf would be the next generation standard of spatial objects in R (https://cran.r-project.org/web/packages/sf/index.html). Knowing the usage and functionality of sf will probably be beneficial.

First, the OP provided an example point, but I decided to add more points so that we can see how to count the points and aggregate the data. To do so, I used the ggmap pakcage to geocode some cities that I selected as an example.

# Load package
library(tidyverse)
library(ggmap)
library(maps)
library(maptools)
library(sf)

# Create point data as a data frame
point_data <- data.frame(lat = 40.730610, lon = -73.935242)

# Geocode a series of cities
city <- c("Detroit", "Seattle", "Toranto", "Denver", "Mexico City", "Paris", "New Orleans",
          "Tokyo", "Osaka", "Beijing", "Canberra", "New York", "Istanbul", "New Delhi",
          "London", "Taipei", "Seoul", "Manila", "Bangkok", "Lagos", "Chicago", "Shanghai")
point_data2 <- geocode(city)

# Combine OP's example and the geocoding result
point_data3 <- bind_rows(point_data, point_data2)

Next, I converted the point_data3 data frame to the sf object. I will also get the polygon data of the world using the maps package and convert it to an sf object.

# Convert to simple feature object
point_sf <- st_as_sf(point_data3, coords = c("lon", "lat"), crs = 4326)

# Get world map data
worldmap <- maps::map("world", fill = TRUE, plot = FALSE)

# Convert world to sp class
IDs <- sapply(strsplit(worldmap$names, ":"), "[", 1L)
world_sp <- map2SpatialPolygons(worldmap, IDs = IDs, 
                                proj4string = CRS("+proj=longlat +datum=WGS84"))

# Convert world_sp to simple feature object
world_sf <- st_as_sf(world_sp)

# Add country ID
world_sf <- world_sf %>%
  mutate(region = map_chr(1:length(world_sp@polygons), function(i){
    world_sp@polygons[[i]]@ID
  }))

Now both point_sf and world_sf are sf objects. We can use the st_within function to examine which points are within which polygons.

# Use st_within
result <- st_within(point_sf, world_sf, sparse = FALSE)

# Calculate the total count of each polygon
# Store the result as a new column "Count" in world_sf
world_sf <- world_sf %>%
  mutate(Count = apply(result, 2, sum))

The total count information is in the Count column of world_sf. We can get the world data frame as the OP did using the map_data function. We can then merge world_data and world_df.

# Convert world_sf to a data frame world_df 
world_df <- world_sf
st_geometry(world_df) <- NULL

# Get world data frame
world_data <- map_data("world")

# Merge world_data and world_df
world_data2 <- world_data %>%
  left_join(world_df, by = c("region"))

Now we are ready to plot the data. The following code is the same as the OP's ggplot code except that the input data is now world_data2 and fill = Count.

ggplot() + 
  geom_polygon(data = world_data2, aes(x = long, y = lat, group = group, fill = Count)) +
  coord_fixed(1.3)
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you! What is the relevance of the 'crs' parameter in the st_as_sf() function? I couldn't find documentation on it.
The crs argument is passed to the st_sffunction. Please see the documentation here: rdocumentation.org/packages/sf/versions/0.5-3/topics/sf
Also see this vigneette: cran.r-project.org/web/packages/sf/vignettes/sf1.html to learn more about the sf package and how the crs (coordinate reference system) is assigned to sf object.
Thanks again, looking at your method, I understand the procedure much better than before. My actual implementation of this is a little more complicated than the example I gave. Let's say I have a data frame consisting of many observations, each with geocoordinates (latitude-longitude) and a boolean (true-false) value. How might I adapt this code to generate a choropleth map of the world where each region/polygon is shaded by the percentage of points within it where the associated boolean value is equal to true (as opposed to being shaded by the amount of points within it)?
It seems like you can calculate the percentage of TRUE in each polygon, then plot that percentage. Something like sum(bollean)/N where N is the number of points in that polygon. If you can begin a new post with some reproducible example datasets, I may be able to help.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.