How to identify css inline attribute

Question

In the webpage that I'm scraping, there are a lot of titles and I need to identify them to set one value in my database. The problem is that those titles doesn't have a specific ID or Class.

They follow those pattern:

<p ALIGN="CENTER"><font face="Arial" SIZE="2">
<a name="tituloivcapituloisecaoii"></a><b>
<span style="text-transform: uppercase">Seção II<br>
DAS ATRIBUIÇÕES DO CONGRESSO NACIONAL</span></b></font></p>


<p ALIGN="CENTER"><font face="Arial" SIZE="2"><a name="tituloivcapituloisecaoiii"></a>
<b><span style="text-transform: uppercase">Seção III<br>
DA CÂMARA DOS DEPUTADOS</span></b></font></p>

One attribute that identifies them is: text-trasform: uppercase.

How can I check if the p contains one title?

That's my current code:

soup = BeautifulSoup(f, 'html.parser')
for tag in soup.findAll():
    if tag.name in ['a', 'strike']:
      tag.decompose()

allp = soup.findAll('p')
for p in allp:          
   print(p)

score 2 · Accepted Answer · 2018-12-12 16:26:43Z

2

Once you have parsed the html by tag type, you can search within the tags using any defining attribute. The text-transform:uppercase can be used in this case.

soup = BeautifulSoup(f, 'html.parser')
for p in soup.find_all("p"):
    if p.span["style"]=="text-transform: uppercase":
        title=p.text
        print(title)

>>>Seção IIDAS ATRIBUIÇÕES DO CONGRESSO NACIONAL

This will find all  tags containing  tags where style=="text-transform: uppercase" and print their associated text.

edited Dec 12, 2018 at 16:26

answered Dec 12, 2018 at 15:52

user10597469

Sign up to request clarification or add additional context in comments.

6 Comments

mr.abdo Over a year ago

It didn't work. I edited my question with my current code to u take a look if there's any problem. When I follow your suggestion, nothing was returned.

mr.abdo Over a year ago

maybe it's happening because the text-transform is an attribute of span

user10597469 Over a year ago

Ok, you are right. change that to if p.span["style"]=="text-transform: uppercase":. I'll update it in the answer as well.

user10597469 Over a year ago

I just did a test on the strings you provided and it works. This is a different problem. If you are getting that error, what it means is that there is nothing in the rest of your code to deal with  tags that don't have  tags associated with them. The code above will fix your current problem but you need to account for the fact that not all the tags you search will have a  tag when you incorperate this into code to search an actual page. If you include if p.span != None: in the top line of your for loop, this will filter out None types.

user10597469 Over a year ago

Awesome! Good luck on the scraper!

|

Collectives™ on Stack Overflow

How to identify css inline attribute

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related