0

I have a code written using BeautifulSoup, I am currently exploring Selenium, but cannot figure (I hope it is possible) to extract data nested inside some HTML.

This is the bs4 code:

def get_data(link):
    soup1 = getdata(link)
    for one_offer in soup1.find_all('li', {'class': 'clearfix'}):
    # Get sqm:
        raw_sqm = one_offer.find('div', {'class': 'inline-group'})
        get_sqm = raw_sqm.get_text().split(',')[1].split()[0]
        sqm_check_value = if_area_not_speicified(get_sqm)
        sqm_area.append(float(sqm_check_value))

The above code takes in the link: https://www.imoti.net/bg/obiavi/r/prodava/sofia/?sid=hSrJhL From the link I do the following: enter image description here

one_offer is one block. From image above that is the red, green and blue rectangle sections. After that for each I get the area indicated with the red arrow from each block and I append them to a list.

How to convert this into Selenium code?

So far I have:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

PATH = '/Applications/chromedriver'
driver = webdriver.Chrome(PATH)

driver.get('https://www.imoti.net/bg/obiavi/r/prodava/sofia/?sid=hSrJhL')

variable = []

def testing_values():
    variable.append(driver.find_elements_by_class_name('clearfix'))

testing_values()
print(variable)

The testing_values function returns the following list:

[[<selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="5e3d2712-f453-4871-a43e-8d72d40e6a65")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="53a21fd3-495a-41d4-9382-ae61961209ed")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="56d80ac6-bfaa-48de-9e87-1d2f3c9a42a4")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2", element="74362762-087e-4221-a4b7-cbdf10a16400")>, <selenium.webdriver.remote.webelement.WebElement (session="45a761354e96082ad7cee4c299682bd2"]

*this list contains 30 items, however I deleted some of them to keep the example smaller.

So, I have a list containing some sort of web-elements, but how do I extract the data from each one in order to get the area, similar to the code using bs4?

2
  • Loop it and xpath ./ it seems to be in a header /div/h3/span where you can just .text the entire span. Commented Nov 22, 2021 at 20:16
  • You also have an extra html element with class clearfix which is a div. Commented Nov 22, 2021 at 21:09

1 Answer 1

2

You had an extra div class with class clearfix. So you want to just loop through them and xpath .// and get their text values.

variable = []

def testing_values():
    variable.append([x.find_element_by_xpath(".//div[@class='real-estate-text']/header/div/h3/span[2]").text for x in driver.find_elements_by_xpath("//li[@class='clearfix']")])

testing_values()
print(variable)

Outputs:

[['543 М2', '10 М2', '12 М2', '36 М2', '660 М2', '635 М2', '44 М2', '41 М2', '50 М2', '60 М2', '50 М2', '64 М2', '64 М2', '59 М2', '90 М2', '51 М2', '1053 М2', '72 М2', '66 М2', '78 М2', '65 М2', '52 М2', '75 М2', '68 М2', '62 М2', '72 М2', '90 М2', '78 М2', '74 М2', '57 М2']]
Sign up to request clarification or add additional context in comments.

1 Comment

thanks for that. I need to watch some tutorials on xpath.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.