1

I am trying to extract data from https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population for my project. I am trying to take the data from the top 20 cities into a pandas dataframe as follows: RANK | CITY | LATITUDE | LONGITUDE

This is so that I can extract the coordinates in the later part of my code and calculate the various parameters I need. This is what I have come up with so far, but it seems to be failing:

rank=[]
city=[]
state=[]
population_present=[]
population_past=[]
changepercent=[]


info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')

for row in bs.find('table').find_all('tr'):
    p = row.find_all('td')


for row in bs.find('table').find_all('tr'):
    p= row.find_all('td')
    if(len(p) > 0):
        rank.append(p[0].text)
        city.append(p[1].text)
        latitude.append(p[2].text.rstrip('\n'))

2 Answers 2

1

You can do it via python pandas.Try below code.

import pandas as pd
import requests
from bs4 import BeautifulSoup

info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')
table=bs.find_all('table',class_='wikitable')[1]
df=pd.read_html(str(table))[0]
#Get the first 20 records
df1=df.iloc[:20]

Rank=df1['2018rank'].values.tolist()
City=df1['City'].values.tolist()
#Get the location in list
locationlist=df1['Location'].values.tolist()
Latitude=[]
Longitude=[]
for val in locationlist:
    val1=val.split("/")[-1]
    Latitude.append(val1.split()[0])
    Longitude.append(val1.split()[-1])

df2=pd.DataFrame({"Rank":Rank,"City":City,"Latitude":Latitude,"Longitude":Longitude})
print(df2)

Output:

                City    Latitude   Longitude  Rank
0        New York[d]  40.6635°N   73.9387°W     1
1        Los Angeles  34.0194°N  118.4108°W     2
2            Chicago  41.8376°N   87.6818°W     3
3         Houston[3]  29.7866°N   95.3909°W     4
4            Phoenix  33.5722°N  112.0901°W     5
5    Philadelphia[e]  40.0094°N   75.1333°W     6
6        San Antonio  29.4724°N   98.5251°W     7
7          San Diego  32.8153°N  117.1350°W     8
8             Dallas  32.7933°N   96.7665°W     9
9           San Jose  37.2967°N  121.8189°W    10
10            Austin  30.3039°N   97.7544°W    11
11   Jacksonville[f]  30.3369°N   81.6616°W    12
12        Fort Worth  32.7815°N   97.3467°W    13
13          Columbus  39.9852°N   82.9848°W    14
14  San Francisco[g]  37.7272°N  123.0322°W    15
15         Charlotte  35.2078°N   80.8310°W    16
16   Indianapolis[h]  39.7767°N   86.1459°W    17
17           Seattle  47.6205°N  122.3509°W    18
18         Denver[i]  39.7619°N  104.8811°W    19
19     Washington[j]  38.9041°N   77.0172°W    20
Sign up to request clarification or add additional context in comments.

Comments

0

You're accessing the wrong element from the webpage. To access the table with the data you want, use this:

info = requests.get('https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population').text
bs = BeautifulSoup(info, 'html.parser')

for tr in bs.findAll('table')[4].findAll('tr'):
    # Now take the data from this row that you want, and put it in a DataFrame

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.