Trouble web scraping on Python

Question

I'm a high school student practicing Python. For a final project, I wanted to use web-scraping (which we haven't covered in class). The following is my code that is supposed to ask a user for their date of birth then print out a list of celebrities that share their birthday (excluding their year of birth).

import requests
from bs4 import BeautifulSoup

print("Please enter your birthday:")
BD_Day = input("Day: ")
BD_Month = input("Month (1-12): ")
Months = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')
Month = dict(zip(range(12), Months))
BD_Month = int(BD_Month)
messy_url = ['http://www.famousbirthdays.com/', Month[BD_Month - 1], BD_Day, '.html']
url = ''.join(messy_url)
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
spans = soup.find_all('span', attrs={'class':'title'})
for span in spans:
    print (span.string)

The code is supposed to search the web page defined as 'url', however, it always prints out a list of people born on November 6:

Lauren Orlando
Emma Stone
Alastair Aiken
Sal Vulcano
Bailey Ballinger

The code also only prints 5/48 names on the page, printing 1-6 (oddly excluding five).

My two main issues are the date and an incomplete list of names-- any input would be appreciated.

Thanks.

The first step is to insure that your url is correct. The second step is to view the content and to ensure that soup properly sets as you expect. Finally, i would get a count/len of your spans. There is a chance that maybe span.title is not valid for all of their cases? — Fallenreaper
– Fallenreaper, Commented Nov 6, 2016 at 17:54
Thank you for the comment,I have verified the url is correct (Day: 15 & Month: 7 made the url ]famousbirthdays.com/July15.html I will check out the soup now, then I'll see if span.title is invalid. I'll edit this with what I find. Thanks. — Gad
– Gad, Commented Nov 6, 2016 at 18:59
ill take a look and see if i can get you the answer you are looking for then — Fallenreaper
– Fallenreaper, Commented Nov 6, 2016 at 19:01
The issue seems to be that the content read is only found on the homepage, not the url which I specified. I'm not sure why this would occur. — Gad
– Gad, Commented Nov 6, 2016 at 19:11
Seems my code has been giving me issues with BeautifulSoup. I would have to look into that more. I would check to ensure your content is a complete content — Fallenreaper
– Fallenreaper, Commented Nov 6, 2016 at 19:36

user14002256 · Accepted Answer · 2020-09-08 21:14:37Z

I would say that your error is coming from the URL, or from the span tags because the website holds all people inside a elements inside div elements.

So, here's how I did it:

import requests
from bs4 import BeautifulSoup

#ask for birthday
print("Please enter your birthday:")
BD_Day = input("Day: ")
BD_Month = input("Month (1-12): ")
Months = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')

#make URL
url = "https://www.famousbirthdays.com/" + str(Months[int(BD_Month) - 1].lower() + BD_Day) + ".html"

#make HTTP request
response = requests.get(url=url)

#parse HTML
page = BeautifulSoup(response.content, 'html.parser')

#find list of all people based on website's HTML
all_people = page.find("div",{"class":"people-list"}).find_all("a",{"class":"person-item"})

#show all people
for person in all_people:
    print(person.find("div",{"class":"info"}).find("div",{"class":"name"}).get_text().strip())

I hope I could help!

Collectives™ on Stack Overflow

Trouble web scraping on Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related