0

I'm a high school student practicing Python. For a final project, I wanted to use web-scraping (which we haven't covered in class). The following is my code that is supposed to ask a user for their date of birth then print out a list of celebrities that share their birthday (excluding their year of birth).

import requests
from bs4 import BeautifulSoup

print("Please enter your birthday:")
BD_Day = input("Day: ")
BD_Month = input("Month (1-12): ")
Months = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')
Month = dict(zip(range(12), Months))
BD_Month = int(BD_Month)
messy_url = ['http://www.famousbirthdays.com/', Month[BD_Month - 1], BD_Day, '.html']
url = ''.join(messy_url)
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
spans = soup.find_all('span', attrs={'class':'title'})
for span in spans:
    print (span.string)

The code is supposed to search the web page defined as 'url', however, it always prints out a list of people born on November 6:

  • Lauren Orlando
  • Emma Stone
  • Alastair Aiken
  • Sal Vulcano
  • Bailey Ballinger

The code also only prints 5/48 names on the page, printing 1-6 (oddly excluding five).

My two main issues are the date and an incomplete list of names-- any input would be appreciated.

Thanks.

7
  • The first step is to insure that your url is correct. The second step is to view the content and to ensure that soup properly sets as you expect. Finally, i would get a count/len of your spans. There is a chance that maybe span.title is not valid for all of their cases? Commented Nov 6, 2016 at 17:54
  • Thank you for the comment,I have verified the url is correct (Day: 15 & Month: 7 made the url ]famousbirthdays.com/July15.html I will check out the soup now, then I'll see if span.title is invalid. I'll edit this with what I find. Thanks. Commented Nov 6, 2016 at 18:59
  • ill take a look and see if i can get you the answer you are looking for then Commented Nov 6, 2016 at 19:01
  • The issue seems to be that the content read is only found on the homepage, not the url which I specified. I'm not sure why this would occur. Commented Nov 6, 2016 at 19:11
  • Seems my code has been giving me issues with BeautifulSoup. I would have to look into that more. I would check to ensure your content is a complete content Commented Nov 6, 2016 at 19:36

1 Answer 1

1

I would say that your error is coming from the URL, or from the span tags because the website holds all people inside a elements inside div elements.

So, here's how I did it:

import requests
from bs4 import BeautifulSoup

#ask for birthday
print("Please enter your birthday:")
BD_Day = input("Day: ")
BD_Month = input("Month (1-12): ")
Months = ('January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December')

#make URL
url = "https://www.famousbirthdays.com/" + str(Months[int(BD_Month) - 1].lower() + BD_Day) + ".html"

#make HTTP request
response = requests.get(url=url)

#parse HTML
page = BeautifulSoup(response.content, 'html.parser')

#find list of all people based on website's HTML
all_people = page.find("div",{"class":"people-list"}).find_all("a",{"class":"person-item"})

#show all people
for person in all_people:
    print(person.find("div",{"class":"info"}).find("div",{"class":"name"}).get_text().strip())

I hope I could help!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.