3

I have the following html code:

<div class='article'>
<p>Lorem <strong>ipsum</strong> si ammet</p>
</div>

So to get the text data as: Lorem ipsum si ammet, so I tried to use:

response.css('div.article >p::text ').extract() 

But I only receive only lorem sie ammet.

How can I get both <p> and <strong> texts using CSS selectors?

2
  • 1
    Doesn't look like a duplicate to me. This question asks for a way specifically using CSS selectors, while the other one only mentions XPath selectors. Commented Apr 25, 2018 at 22:51
  • Nope, is not a duplicate. Commented Sep 25, 2018 at 21:26

2 Answers 2

4

One liner solution.

"".join(a.strip() for a in response.css("div.article *::text").extract())

div.article * means to scrape everything inside the div.article

Or an easy way to write it

text = ""
for a in response.css("div.article *::text").extract()
    text += a.strip()

Both approaches are same,

Sign up to request clarification or add additional context in comments.

Comments

1

In Scrapy 2.7+ you can do so with following

text = response.css('div.article *::text').getall()
text = [t.strip() for t in text]
text = "".join(text)

getall() method returns list.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.