1

I'm trying to extract player position from many players' webpages (here's an example for Malcolm Brogdon). I'm able to extract Malcolm Brogdon's position using the following code:

player_id = 'malcolm-brogdon-1'

# Import libraries
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import pandas as pd
import numpy as np

url = "https://www.sports-reference.com/cbb/players/{}.html".format(player_id)
req = Request(url , headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
page_soup = soup(webpage, "html.parser")

pos = page_soup.p.find("strong").next_sibling.strip()
pos

However, I want to be able to do this in a more dynamic way (that is, to locate "Position:" and then find what comes after). There are other players for which the webpage is structured slightly differently, and my current code wouldn't return position (i.e. Cat Barber).

I've tried doing something like page_soup.find("strong", text="Position:") but that doesn't seem to work.

Malcolm Brogdon's Sports-Reference webpage

1 Answer 1

1

You can select the element that contains the text "Position:" and then the next text sibling:

import requests
from bs4 import BeautifulSoup


url = "https://www.sports-reference.com/cbb/players/anthony-cat-barber-1.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

pos = soup.select_one('strong:contains("Position")').find_next_sibling(text=True).strip()
print(pos)

Prints:

Guard

EDIT: Another version:

import requests
from bs4 import BeautifulSoup


url = "https://www.sports-reference.com/cbb/players/anthony-cat-barber-1.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

pos = (
    soup.find("strong", text=lambda t: "Position" in t)
    .find_next_sibling(text=True)
    .strip()
)
print(pos)
Sign up to request clarification or add additional context in comments.

4 Comments

When I run this code I get the following error: NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type. Any idea why?
@Christine you are using old version of BeautifulSoup. Update to the latest.
@Christine I put also other version, you can try it (maybe it will work with old version of bs4)
I updated my version because that's probably just a good idea in general. It works great now! Thanks so much. I'll go ahead and mark this as the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.