0

I've been stuck at this for eons now... Can you please help?

Trying to build a scraper that scrapes listings on this website and I just cannot for the life of me get the URL of each listing. Can you please help?

I've tried numerous ways to locate the element, this latest one is by the absolute XPath (by class always failed as well)

The code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
import time

PATH = "/Users/csongordoma/Documents/chromedriver"
driver = webdriver.Chrome(PATH)
driver.get('https://ingatlan.com/lista/elado+lakas+budapest')

data = {}
df = pd.DataFrame(columns=['Price', 'Address', 'Size', 'Rooms', 'URL'])

listings = driver.find_elements_by_css_selector('div.listing__card')
for listing in listings:
    data['Price'] = listing.find_elements_by_css_selector('div.price')[0].text
    data['Address'] = listing.find_elements_by_css_selector('div.listing__address')[0].text
#    data['Size'] = listing.find_elements_by_css_selector('div.listing__parameter listing__data--area-size')[0].text
    data['URL'] = listing.find_elements_by_xpath('/html[1]/body[1]/div[1]/div[2]/div[4]/div[1]/main[1]/div[1]/div[1]/div[1]/a[3]')[0].text
    df = df.append(data, ignore_index=True)

print(len(listings))
print(data)

#   driver.find_element_by_xpath("//a[. = 'Következő oldal']").click()

driver.quit()

The error message:

Traceback (most recent call last):
  File "hello.py", line 18, in <module>
    data['URL'] = listing.find_elements_by_xpath('/html[1]/body[1]/div[1]/div[2]/div[4]/div[1]/main[1]/div[1]/div[1]/div[1]/a[3]')[0].text
IndexError: list index out of range

Many thanks!

2
  • It seems to be the second a[2] of listing and not a[3]. Also use a relative path and not an absolute xpath. Then use get_attribute('href') instead of text. Commented Nov 26, 2020 at 23:52
  • your find_elements is returning no matching elements. Fix the xpath. Commented Nov 27, 2020 at 3:18

1 Answer 1

1

Something like the below would work. To get a webelement of a[2] from an element and it's href.

data['URL'] = listing.find_element_by_xpath('//a[2]').get_attribute('href')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.