Python web scraping html with xpath syntax issue

Question

I'm new to python and trying to pull the billboard hot 100 list. I know there is a library already, but I'm practicing (and its done differently). My issue is that Billboard's list of songs doesn't match up with the artists because the syntax of selecting the artist changes between an "a" element and a "span" element. How do I include both types of elements which both contain [@class="chart-row__artist"].

Currently I have:

artists = [x.strip() for x in tree.xpath('//a[@class="chart-row__artist"]/text()')]

but this pulls up songs as well with span:

artists = [x.strip() for x in tree.xpath('//span[@class="chart-row__artist"]/text()')]

It alternates on the page. Any suggestions?

billboard.com/charts/hot-100

user192085
– user192085

2018-06-03 21:25:59 +00:00
Commented Jun 3, 2018 at 21:25 — user192085
– user192085, Commented Jun 3, 2018 at 21:25

user192085 · Accepted Answer · 2018-06-03 21:29:33Z

1

I think I got the syntax for XPath right. It seems like the songs are matching appropriately with artists despite the alternating element nodes for artists. I did this:

artists = [x.strip() for x in tree.xpath('//*[@class="chart-row__artist"]/text()')]

The prefix //* chose the whole document then matched against the class name, so this covered both 'a' elements and 'span' elements.

answered Jun 3, 2018 at 21:29

user192085

1171 gold badge1 silver badge11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Luke · Accepted Answer · 2018-06-03 21:30:48Z

0

Is using xpath necessary? I got a list of all artists with bs4 pretty easily.

import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.billboard.com/charts/hot-100')
soup = BeautifulSoup(response.content, 'lxml')
artists = [row.text.strip() for row in soup.select('.chart-row__artist')]
print(artists)

answered Jun 3, 2018 at 21:30

Luke

7722 gold badges7 silver badges23 bronze badges

1 Comment

user192085 Over a year ago

interesting! thanks for your input. That does seem simpler.

Collectives™ on Stack Overflow

Python web scraping html with xpath syntax issue

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related