Extract text with bold content from css selector

Question

I am trying to extract a text from forum posts, however the bold element is ignored.

How can I extract raw data like Some text to extract bold content? Currently I am getting only Some text to extract ?

<blockquote class="messageText SelectQuoteContainer ugc baseHtml">
Some text to extract <b>bold content</b>?
</blockquote>

def parse_page(self, response):
    for quote in response.css('article'):
        yield {
            'text': quote.css('blockquote::text').extract()
        }

Granitosaurus · Accepted Answer · 2017-04-13 10:36:34Z

1

You need a space in your css selector:

'blockquote ::text'
           ^

Because you want text of every descending node under blockquote, without space it means just the text of blockquote node.

edited Apr 13, 2017 at 10:36

answered Apr 13, 2017 at 10:19

Granitosaurus

21.6k6 gold badges64 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

anvd Over a year ago

The not selector will stop working with the space? blockquote:not(.bbCodeBlock) ::text Apparently yes.

Granitosaurus Over a year ago

@anvd just tested, it should and does works fine. Tested: 'blockquote:not(.foo) ::text'

anvd Over a year ago

the markup is a bit more complicated, and it will not work as expected jsfiddle.net/dwfmLcaj

Granitosaurus Over a year ago

@anvd This is not javascript. Scrapy converts all css selectors to xpath so the only css selector implementation that matters here is cssselect package, see: github.com/scrapy/cssselect.

anvd Over a year ago

thanks for the link, but currently the problem is the css. I don't even know how select that part of text that don't have any element associated. The problem is css for now

Umair Ayub · Accepted Answer · 2017-04-13 11:10:32Z

1

Use * selector to select text of all inner elements inside an element.

''.join([ a.strip() for a in quote.css('blockquote *::text').extract() ])

answered Apr 13, 2017 at 11:10

Umair Ayub

21.7k14 gold badges82 silver badges154 bronze badges

Collectives™ on Stack Overflow

Extract text with bold content from css selector

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related