0

I am using BeautifulSoup and Python to scrape a webpage. I have a BS element,

a = soup.find('div', class_='section lot-details')

which returns a series of list objects as per below.

<li><strong>Location:</strong> WA - 222 Welshpool Road, Welshpool</li>
<li><strong>Deliver to:</strong> Pickup Only WA</li>

I want to return the text after each str

WA - 222 Welshpool Road, Welshpool
Pickup Only WA

How do I get this out of the BS object? I'm unsure of the regex, and also how this interacts with BeautifulSoup.

1
  • How does getting div return li? Commented May 19, 2016 at 13:43

2 Answers 2

1

(?:</strong>)(.*)(?:</li>) capture field \1 (.*) would do the work.

Python code sample:

In [1]: import re
In [2]: test = re.compile(r'(?:</strong>)(.*)(?:</li>)')
In [3]: test.findall(input_string)
Out[1]: [' WA - 222 Welshpool Road, Welshpool', ' Pickup Only WA']

check it here https://regex101.com/r/fD0fZ9/1

Sign up to request clarification or add additional context in comments.

1 Comment

This works & also gives me a method for other more general cases as well.
1

You don't really need regex. If you have your li tags in a list:

>>> for li in li_elems:
...     print li.find('strong').next_sibling.strip()

WA - 222 Welshpool Road, Welshpool
Pickup Only WA

This is assuming that there is only one strong element in the li and text is afterwards.

Or, alternatively:

>>> for li in li_elems:
...     print li.contents[1].strip()

WA - 222 Welshpool Road, Welshpool
Pickup Only WA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.