2

I'm new to python and trying to pull the billboard hot 100 list. I know there is a library already, but I'm practicing (and its done differently). My issue is that Billboard's list of songs doesn't match up with the artists because the syntax of selecting the artist changes between an "a" element and a "span" element. How do I include both types of elements which both contain [@class="chart-row__artist"].

Currently I have:

artists = [x.strip() for x in tree.xpath('//a[@class="chart-row__artist"]/text()')]

but this pulls up songs as well with span:

artists = [x.strip() for x in tree.xpath('//span[@class="chart-row__artist"]/text()')]

It alternates on the page. Any suggestions?

1

2 Answers 2

1

I think I got the syntax for XPath right. It seems like the songs are matching appropriately with artists despite the alternating element nodes for artists. I did this:

artists = [x.strip() for x in tree.xpath('//*[@class="chart-row__artist"]/text()')]

The prefix //* chose the whole document then matched against the class name, so this covered both 'a' elements and 'span' elements.

Sign up to request clarification or add additional context in comments.

Comments

0

Is using xpath necessary? I got a list of all artists with bs4 pretty easily.

import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.billboard.com/charts/hot-100')
soup = BeautifulSoup(response.content, 'lxml')
artists = [row.text.strip() for row in soup.select('.chart-row__artist')]
print(artists)

1 Comment

interesting! thanks for your input. That does seem simpler.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.