2

Is it possible to scrape the polygon data from this interactive map in R?

🔗 https://fogocruzado.org.br/mapadosgruposarmados

The map shows territories controlled by armed groups across different years. I'd like to automate the extraction of this data for all years and save each as multiple shapefiles.

They mention an API, but this specific data doesn’t seem to be available through it.

I did this but give me something that makes no sense.

# Load the HTML file
html_file = read_html(html_path)

# Extract path data from the HTML
path_data = html_file %>%
  html_nodes("path.leaflet-interactive") %>%
  html_attr("d")

# Function to clean and extract coordinates from path data
clean_coordinates <- function(path) {
  # Remove "M", "L", "z" commands (move, line, close)
  path <- gsub("[MLz]", "", path)  # Remove M, L, and z commands
  path <- gsub("[^0-9,.-]", " ", path)  # Remove any non-numeric characters except comma and period
  coords <- strsplit(path, " ")  # Split by spaces to separate coordinates
  coords <- as.numeric(unlist(coords))  # Flatten list and convert to numeric
  
  # Ensure coordinates are in pairs (lat, lon)
  coords_matrix <- matrix(coords, ncol = 2, byrow = TRUE)
  
  # Ensure the coordinates are in [lon, lat] order for GeoJSON
  coords_matrix <- coords_matrix[, c(2, 1)]  # Switch lat, lon order
  
  # Optionally, adjust the coordinates if they are too far off (e.g., by shifting or scaling)
  # coords_matrix <- coords_matrix - min(coords_matrix, na.rm = TRUE)  # Adjust if needed
  
  return(coords_matrix)
}

# Apply the function to clean the coordinates
coordinates_list <- lapply(path_data, clean_coordinates)

# Create GeoJSON features
geojson <- list(
  type = "FeatureCollection",
  features = lapply(coordinates_list, function(coord) {
    list(
      type = "Feature",
      geometry = list(
        type = "Polygon",  # or "LineString" based on the data
        coordinates = list(coord)  # Directly use the coordinates list
      )
    )
  })
)

# Convert the result to JSON
geojson_json <- toJSON(geojson, pretty = TRUE)

Blockquote

8
  • 1
    Did you have a look into the network? F12 -> Network -> Filter for "fetch" and look for any json files. If that does not work, I believe the leaflet map object can be fetched and the layer data read. Commented Apr 9 at 14:37
  • @TimG I was just doing that, I see nothing. It's using leaflet, which does not (afaict) use json returns. Commented Apr 9 at 14:38
  • 1
    Maria, it's using leaflet, which may or may not help you: if you can generalize your research to include how to extract data from leaflet-generated pages, you might find something useful. I'd say reach out to the authors of the leaflet js code, but not sure about their responsiveness due to the Russian invasion of Ukraine. It's possible that RStudio (maintainers of the leaflet R package) might have some insight, but afaict they do not frequent SO so much, you may consider the Posit Community. Good luck! Commented Apr 9 at 14:41
  • Buggers, yes they have an API but it seems likely only for shootings that occured. Not sure if it offers spacial data too. Commented Apr 9 at 14:43
  • 1
    I did the same thing. But also couldn't translate back to something reasonable. ``` path_data = html_file %>% html_nodes("path.leaflet-interactive") %>% html_attr("d") ``` Commented Apr 9 at 19:26

1 Answer 1

6
+50

Ahhh I finally got it :)

  1. use selenider to open your url
  2. use execute_js_expr to..
  3. search for the leaflet container document.querySelectorAll('.leaflet-container')
  4. find the obscurely hidden getMap function to retrieve the map object map = mapContainers[0].htmlwidget_data_init_result.getMap()
  5. With this we only need to filter the layer.options.group == "2023" - there are multiple layers, not all are important here. Somehow I could only retrieve the group 2023. You might need to instruct selenider to click on the other layers to retrieve the data for the other years
  6. push the coordinates, label and color to a polygons list
let polygon = {
                coordinates: layer.getLatLngs(),
                fillColor: layer.options.fillColor,
                label: layer.options.label || 'Unknown'
            };
polygons.push(polygon);
  1. We can actually return javascript objects here to R, but we have to be carefull to JSON-stringify first return JSON.stringify(polygons);
  2. Finally use toJSON to write the polygons to dataframe
  3. optionally plot the results in our own leaflet

Code

library(selenider)
library(jsonlite)

session <- selenider_session("selenium", browser = "chrome")
open_url("https://fogocruzado.org.br/mapadosgruposarmados")


map_analysis <- execute_js_expr("
  
  // retrieve map object
  let mapContainers = document.querySelectorAll('.leaflet-container');  
  let map = mapContainers[0].htmlwidget_data_init_result.getMap();
  
  console.log(map._layers); // look at the selenider-console
  
  let layers = map._layers; // just for debugging and inspecting layers
  
  // collect polygons
  let polygons = [];
  for (let layerId in layers) {
    let layer = layers[layerId];
    
    // Filter for group 2023 - idk where the other groups are
    if (layer.options && layer.options.group == '2023') {
        
        // for Polygon-Layer (L.Polygon oder L.Rectangle)
        if (layer.getLatLngs) {
            let polygon = {
                coordinates: layer.getLatLngs(),
                fillColor: layer.options.fillColor,
                label: layer.options.label || 'Unknown',
                popup: layer.options.popup // get popup data too!
            };
            polygons.push(polygon);
        }
    }
  }

  //console.log(polygons);
  return JSON.stringify(polygons);
")

polygons_data <- jsonlite::fromJSON(map_analysis)


# now build our own leaflet map using the data

library(leaflet)

m <- leaflet() %>%
  addTiles()

for (i in 1:nrow(polygons_data)) {
  
  coord_entry <- polygons_data$coordinates[[i]]
  
  # Some coordinates NULL- yikes
  if (!is.null(coord_entry) && length(coord_entry) > 0 && is.data.frame(coord_entry[[1]])) {
    
    coords_df <- coord_entry[[1]]
    
    # Check if the data frame has both lat and lng and is not empty
    if (nrow(coords_df) > 2 && all(c("lat", "lng") %in% colnames(coords_df))) {
      
      m <- m %>% addPolygons(
        lng = coords_df$lng,
        lat = coords_df$lat,
        fillColor = polygons_data$fillColor[i],
        color = "#000000",
        weight = 1,
        opacity = 1,
        fillOpacity = 0.5,
        popup = polygons_data$popup[i]
      )
    }
  }
}

m

Results

out2

Notes

  • this took a while to figure out, it can be improoved of cause. E.g. the for-loop to add the leaflet polygons is quite slow
  • In selenider, if you select a different layer, e.g. 2017, you can pull this layer by changing the group filter to == 2017 and rerunning the execute_js_expr. Selenider can also perform clicks on elements, then the layer info for all years can be crawled over and pasted together to one huge data.frame.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.