Extracting nested elements using Selenium

Question

I have a code written using BeautifulSoup, I am currently exploring Selenium, but cannot figure (I hope it is possible) to extract data nested inside some HTML.

This is the bs4 code:

def get_data(link):
    soup1 = getdata(link)
    for one_offer in soup1.find_all('li', {'class': 'clearfix'}):
    # Get sqm:
        raw_sqm = one_offer.find('div', {'class': 'inline-group'})
        get_sqm = raw_sqm.get_text().split(',')[1].split()[0]
        sqm_check_value = if_area_not_speicified(get_sqm)
        sqm_area.append(float(sqm_check_value))

The above code takes in the link: https://www.imoti.net/bg/obiavi/r/prodava/sofia/?sid=hSrJhL From the link I do the following:

one_offer is one block. From image above that is the red, green and blue rectangle sections. After that for each I get the area indicated with the red arrow from each block and I append them to a list.

How to convert this into Selenium code?

So far I have:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

PATH = '/Applications/chromedriver'
driver = webdriver.Chrome(PATH)

driver.get('https://www.imoti.net/bg/obiavi/r/prodava/sofia/?sid=hSrJhL')

variable = []

def testing_values():
    variable.append(driver.find_elements_by_class_name('clearfix'))

testing_values()
print(variable)

The testing_values function returns the following list:

[[<selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="5e3d2712-f453-4871-a43e-8d72d40e6a65")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="53a21fd3-495a-41d4-9382-ae61961209ed")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="56d80ac6-bfaa-48de-9e87-1d2f3c9a42a4")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="74362762-087e-4221-a4b7-cbdf10a16400")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2"]

*this list contains 30 items, however I deleted some of them to keep the example smaller.

So, I have a list containing some sort of web-elements, but how do I extract the data from each one in order to get the area, similar to the code using bs4?

Loop it and xpath ./ it seems to be in a header /div/h3/span where you can just .text the entire span. — Arundeep Chohan
– Arundeep Chohan, Commented Nov 22, 2021 at 20:16
You also have an extra html element with class clearfix which is a div. — Arundeep Chohan
– Arundeep Chohan, Commented Nov 22, 2021 at 21:09

Arundeep Chohan · Accepted Answer · 2021-11-22 21:10:48Z

2

You had an extra div class with class clearfix. So you want to just loop through them and xpath .// and get their text values.

variable = []

def testing_values():
    variable.append([x.find_element_by_xpath(".//div[@class='real-estate-text']/header/div/h3/span[2]").text for x in driver.find_elements_by_xpath("//li[@class='clearfix']")])

testing_values()
print(variable)

Outputs:

[['543 М2', '10 М2', '12 М2', '36 М2', '660 М2', '635 М2', '44 М2', '41 М2', '50 М2', '60 М2', '50 М2', '64 М2', '64 М2', '59 М2', '90 М2', '51 М2', '1053 М2', '72 М2', '66 М2', '78 М2', '65 М2', '52 М2', '75 М2', '68 М2', '62 М2', '72 М2', '90 М2', '78 М2', '74 М2', '57 М2']]

answered Nov 22, 2021 at 21:10

Arundeep Chohan

9,9895 gold badges17 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tsetsko Over a year ago

thanks for that. I need to watch some tutorials on xpath.

Collectives™ on Stack Overflow

Extracting nested elements using Selenium

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related