Select a group of elements and text using css selectors

Question

I have an HTML page like:-

<div>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
<a href='link'>
<u class>name</u>
</a>
text
<br>
</div>

I need to select a group like this:-

<a href='link'>
<u class>name</u>
</a>
text
<br>

I need to select 3 values from a group:- link, name, and text. Is there any way to select a group like this, and extract these particular values from each group in scrapy using, CSS selectors, Xpath, or anything?

Neha Setia Nagpal · Accepted Answer · 2022-07-11 12:03:37Z

1

Scrapy provides a mechanism to yield multiple values on the html page using Items- as items, Python objects that define key-value pairs.

You can extract individually and but yield them together as key-value pairs.

to extract value of an attribute of an element, use attr().
to extract innerhtml, use text.

Like you can define your parse function in scrapy like this:

def parse(self, response):
      
        for_link = response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8)  a::attr(href)').getall()
            
        for_name = response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8) a u::text').getall()
              
        for_text =  response.css(' .row.no-gutters div:nth-child(3) div:nth-child(8)::text').getall()
             
            # Yield all elements
            yield {"link": for_link, "name": for_name, "text": for_text}

Open the items.py file.

# Define here the models for your scraped
# items
# Import the required library
import scrapy
 
# Define the fields for Scrapy item here
# in class
class <yourspider>Item(scrapy.Item):
     
    # Item key for a
    for_link = scrapy.Field()
     
    # Item key for u
    for_name = scrapy.Field()
     
    # Item key for span
    for_text = scrapy.Field()

for more details, read this tutorial

edited Jul 11, 2022 at 12:03

answered Jul 11, 2022 at 8:44

Neha Setia Nagpal

5892 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Vishnubly Over a year ago

Hi, in my provided example, 'text' is not in <span>, so I guess this will not work, can you answer in case the 'text' is not in <span>?

Neha Setia Nagpal Over a year ago

Hi @Vishnu, try it now and feel free to ask any further questions :)

Vishnubly Over a year ago

Hi, I am sorry, but, the above code does not works as in my case 'name' is inside <a> but both 'text and <a> are inside <div>, so to select 'text' maybe would need to do like 'div *::text', but this also does not work as it will get 'name' again, and also gets '\n' for some reason, maybe from <br>

Neha Setia Nagpal Over a year ago

can you share the website, you are trying to scrape? I will share code accordingly.

Vishnubly Over a year ago

Hi, sorry for late reply (power outage), I am trying to scrape https://www.mangaupdates.com/series/r4ayzg7/mairimashita-iruma-kun here in this page, I want to get "Related Series" section

|

halfer · Accepted Answer · 2023-10-17 22:24:04Z

1

If it's okay to wrap text in a span like so:

<a href='link'>
<u class>name</u>
</a>
<span>text</span>
<br>

Then you can select everything in CSS like so:

a, a + span {}

Or you can style these two separately:

a {}
a + span {}

The + means "comes immediately after" or "is immediately followed by".

edited Oct 17, 2023 at 22:24

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Jul 11, 2022 at 7:32

Sam

15.6k25 gold badges98 silver badges154 bronze badges

1 Comment

Vishnubly Over a year ago

Sorry @Sam, but I do not own the HTML, I just receive the HTML in the specified format

Collectives™ on Stack Overflow

Select a group of elements and text using css selectors

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related