1

I have a problem to get the data from some page. This is part of my code:

for result in results:
        street = result.find('p', attrs={'class':'size16'}).text
        records.append((street))  
        print (street)

Website:

    <div class="media-body pt5 pb10">
     <div class="mb15">
        <span class="map-item-city block mb0 colorgreen">City</span>
        <p class="small mb20">&nbsp;</p>
        <p class="size16">street 98<br>phone. 22 721-56-70</p>
     </div>
     <div class="colorblack"><strong>open</strong></div>
     <div class="mb20 size16">Mon.-Fr. 07.30-15.30</div>
     <div class="mb15 ">

Result of my code:

ul. Bema 2phone. (32) 745 72 66-69 Wroclaw None
ul. 1 Maja 22/Vphone. 537-943-969 Olawa <p class="small mb20 colorgreen">Placowka partnerska</p>

I would like to separate or delete the text after a "br" tag. I need only 'street'

    <p class="size16">street 98<br>phone. 22 721-56-70</p>

Can You help me?

2 Answers 2

1

Use previous_sibling like this:

from bs4 import BeautifulSoup

html = """
<div class="media-body pt5 pb10">
     <div class="mb15">
        <span class="map-item-city block mb0 colorgreen">Bronisze</span>
        <p class="small mb20">&nbsp;</p>
        <p class="size16">Poznańska 98<br>tel. 22 721-56-70</p>
     </div>
     <div class="colorblack"><strong>Godziny otwarcia</strong></div>
     <div class="mb20 size16">Pn.-Pt. 07.30-15.30</div>
<div class="mb15 ">
"""

result=BeautifulSoup(html, "lxml")

br = result.find('br')
print (br.previous_sibling)

Or if you want to narrow it down a bit:

street = result.find('p', attrs={'class':'size16'}).find('br').previous_sibling
print (street)

Outputs (in both cases)

Poznańska 98

From the documentation https://www.crummy.com/software/BeautifulSoup/bs4/doc/

.next_sibling and .previous_sibling

You can use .next_sibling and .previous_sibling to navigate between page elements that are on the same level of the parse tree:

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for your interest in my problem. Unfortunately, this solution does not work properly. This code clones the street name and mixes the remaining results.
Can you explain some more? Are you calling find() again?
name = result.find('p', attrs={'class':'small mb20 colorgreen'}).text
AttributeError: 'NoneType' object has no attribute 'text'
0
from bs4 import BeautifulSoup

html = """
<div class="media-body pt5 pb10">
     <div class="mb15">
        <span class="map-item-city block mb0 colorgreen">Bronisze</span>
        <p class="small mb20">&nbsp;</p>
        <p class="size16">Poznańska 98<br>tel. 22 721-56-70</p>
     </div>
     <div class="colorblack"><strong>Godziny otwarcia</strong></div>
     <div class="mb20 size16">Pn.-Pt. 07.30-15.30</div>
<div class="mb15 ">
"""

soup=BeautifulSoup(html, "lxml")

for html_tag_div in soup.find_all('div', class_ = "media-body pt5 pb10"):

    for html_tag_div_1 in html_tag_div.find_all('div', class_ = "mb15"):

        for html_tag_2 in html_tag_div_1.find_all("p", class_ = "size16"):

            for html_tag_3 in html_tag_2.find("br").previous_siblings:

                print(html_tag_3.get_text())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.