Find Elements Between Div With Selenium in Python

Question

I have the following HTML code, I want to extract Years and names, I tried everything with no success :

<div class="Year">

<span class="date">2019</span>

</div>



<div class="cl2">
    <span class="name">name1</span>
</div>
<div class="cl2">
    <span class="name">name2</span>
</div>
<div class="cl2">
    <span class="name">name3</span>
</div>
<div class="cl2">
    <span class="name">name4</span>
</div>



<div class="Year">
    <span class="date">2020</span>
</div>

<div class="cl2">
    <span class="name">name5</span>
</div>
<div class="cl2">
    <span class="name">name6</span>
</div>

What I want to get is :

2019
name1
name2
name3
name4
2020
name5
name6

I tried the following, using xpath

years = driver.find_elements_by_xpath("//div[@class='year']")

for year in years:
    
    print(year.find_element_by_xpath(".//span[@class='date']").text)

names = driver.find_elements_by_xpath("//div[@class='name']")

for name in names:
    print(name.find_element_by_xpath(".//span[@class='name']").text)

I got :

2019

2020

name1

name2

name3

name4

name5

name6

Sers · Accepted Answer · 2020-08-01 19:18:32Z

1

You can get them using xpath and preceding:

names = dict()
for e in driver.find_elements_by_class_name('name'):
    name = e.text
    year = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
    names[name] = year

{'name1': '2019', 'name2': '2019', 'name3': '2019', 'name4': '2019', 'name5': '2020', 'name6': '2020'}

Also you can get all elements and collect using class:

names = dict()
year = None
for e in driver.find_elements_by_css_selector('.date, .name'):
    if 'name' in e.get_attribute('class'):
        names[e.text] = year
    if 'date' in e.get_attribute('class'):
        year = e.text

{'name1': '2019', 'name2': '2019', 'name3': '2019', 'name4': '2019', 'name5': '2020', 'name6': '2020'}

answered Aug 1, 2020 at 19:18

Sers

12.3k2 gold badges14 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

dxtr Over a year ago

Hello Sers , I posted another problem on this same question , Would you please take a look at it if you don't mind stackoverflow.com/questions/63215107/…

Tyler Russin · Accepted Answer · 2020-08-01 19:29:56Z

A solution is to work with a html file converted to a text file rather than working with the html file directly. This approach gives much more flexibility to extract the desired text from the given source file.

Firstly, import the import re library which will allow us to easily parse our html_text file

Then read in the text file and use .split() to split the text into a list based off of the year class. Next, iterate over the list and use re.search and re.findall to target your date and name classes within the text strings.

import re 

f = open("html_text.txt", "r")
html_text = (f.read())

text_list = text.split('<div class="Year">')

for year in text_list[1:]:
  date = re.search('<span class="date">(.+?)</span>', year)
  names = re.findall('<span class="name">(.+?)</span>', year)

  print(date.group(1))
  for name in names:
    print(name)

The output when printing out the results should look something like this

Output:

2019
name1
name2
name3
name4
2020
name5
name6

Hope this helped!!

Al Martins · Accepted Answer · 2022-01-31 17:39:09Z

0

I managed to find elements between div using .get_attribute("textContent") instead of .text using tip from Get Text from Span returns empty string

answered Jan 31, 2022 at 17:39

Al Martins

4366 silver badges15 bronze badges

Collectives™ on Stack Overflow

Find Elements Between Div With Selenium in Python

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related