I'm currently learning basic web scraping and I'm curious about .find() and if it works with attributes that aren't class or id.
As a practice, I'm trying to get data from the nyc.gov website and export a list of the NYPD precincts + their addresses into a .csv file. In the HTML on the nyc.gov site, the precinct numbers and addresses are labeled as data-label="Precinct" and "Address" respectively, but the guide I'm using only shows how to use class_="etc" with .find(), and I don't know if data attributes are different.
This is my code so far:
import pandas as pd
import requests
import re
from bs4 import BeautifulSoup
url = "http://wgetsnaps.github.io/nyc.gov--nypd-videos/html/nypd/html/home/precincts.shtml.html"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
precincts = soup.find_all("td")
numbers = []
addresses = []
for precinct in precincts:
number = precinct.find("td", data-label_="Precinct").text.strip()
address = precinct.find("td", data-label_="Address").text.strip()
numbers.append(number)
address.append(address)
I'll compile it into a dataframe and add an export into .csv later. Jupyter Notebook says, for "data-label_": SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
I did not mean "==", but I understand its confusion. Any tips?
