How to extract values or data from a list of stored links using selenium python?

Question

I am trying to scrape price of a real estate website namely this one, so I made a list of scraped links and wrote scripts to get prices from all those links. I tried googling and asking around but could not find a decent answer, I just want to get price values from list of links and store it in a way so that it can be converted into a csv file later on with house name, location,price as headers along with respective datas . The output I am getting is: .The last list with a lot of prices is what I want. My code is as follows

 from selenium import webdriver 
PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
driver =webdriver.Chrome(PATH) #preety important part
driver.get("https://www.nepalhomes.com/list/&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a")
driver.implicitly_wait(10)
data_extract= pd.read_csv(r'F:\github projects\homie.csv') #reading csv file which contains 8 links 
de = data_extract['Links'].tolist() #converting the csv file to list so that it can be iterated 
data=[] # created an empty list to store extracted prices after the scraping is done from homie.csv
for url in de[0:]: #de has all the links which i want to iterate and scrape prices 
    driver.get(url)
    prices = driver.find_elements_by_xpath("//div[@id='app']/div[1]/div[2]/div[1]/div[2]/div/p[1]")
    for price in prices: #after finding xapth get prices 
        data.append(price.text)
    print(data) # printing in console just to check what kind of data i obtained

any help will be appreciated. The output I am expecting is something like this [[price of house inside link 0], [price of house inside link 1], similarly]..the links in homie.csv are as follows

Links
https://www.nepalhomes.com/detail/bungalow-house-for-sale-at-mandikhatar
https://www.nepalhomes.com/detail/brand-new-house-for-sale-in-baluwakhani
https://www.nepalhomes.com/detail/bungalow-house-for-sale-in-bhangal-budhanilkantha
https://www.nepalhomes.com/detail/commercial-house-for-sale-in-mandikhatar
https://www.nepalhomes.com/detail/attractive-house-on-sale-in-budhanilkantha
https://www.nepalhomes.com/detail/house-on-sale-at-bafal
https://www.nepalhomes.com/detail/house-on-sale-in-madigaun-sunakothi
https://www.nepalhomes.com/detail/house-on-sale-in-chhaling-bhaktapur

Ram · Accepted Answer · 2021-08-16 14:46:58Z

2

There is no need to use Selenium to get the data you need. That page loads it's data from an API endpoint.

The API endpoint:

https://www.nepalhomes.com/api/property/public/data?&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a

You can directly make a request to that API endpoint using requests module and get your data.

This code will print all the prices.

import requests

url = 'https://www.nepalhomes.com/api/property/public/data?&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a'

r = requests.get(url)
info = r.json()

for i in info['data']:
    print([i['basic']['title'],i['price']['value']])

['House on sale at Kapan near Karuna Hospital ', 15500000]
['House on sale at Banasthali', 70000000]
['Bungalow house for sale at Mandikhatar', 38000000]
['Brand new house for sale in Baluwakhani', 38000000]
['Bungalow house for sale in Bhangal, Budhanilkantha', 29000000]
['Commercial house for sale in Mandikhatar', 27500000]
['Attractive house on sale in Budhanilkantha', 55000000]
['House on sale at Bafal', 45000000]

edited Aug 16, 2021 at 14:46

answered Aug 16, 2021 at 13:11

Ram

4,7892 gold badges17 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Recurfor Over a year ago

this is very helpful but my aim with this project is to learn selenium and web-scrapping and also I need the price in list like this [1550000, 7000000000, 3800000]

Recurfor Over a year ago

How did you know that data can be extracted from API.. Is there a way to find it out?

Ram Over a year ago

Open Chrome Dev tools -> Network tab. Reload the page, you'll see all the requests that this page makes in there.

Ram Over a year ago

Ofcourse, you try scraping with selenium (like you did) to learn. I just gave you an easy way to do it.

Prophet · Accepted Answer · 2021-08-16 14:24:14Z

1

I see several problems here:

I couldn't see no elements matching text-3xl font-bold leading-none text-black class names on the https://www.nepalhomes.com/list/&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a web page
Even if there were such elements - for multiple class names you should use CSS selector or XPath so instead of

find_elements_by_class_name('text-3xl font-bold leading-none text-black')

it should be

find_elements_by_css_selector('.text-3xl.font-bold.leading-none.text-black')

find_elements method returns a list of web elements, so to get texts from these elements you have to iterate over the list and get text from each element, like following:

prices = driver.find_elements_by_css_selector('.text-3xl.font-bold.leading-none.text-black')
for price in prices:
    data.append(price.text)

UPD
With this locator it works correct for me:

prices = driver.find_elements_by_xpath("//p[@class='text-xl leading-none text-black']/p[1]")
for price in prices:
    data.append(price.text)

edited Aug 16, 2021 at 14:24

answered Aug 16, 2021 at 12:53

Prophet

33.5k28 gold badges58 silver badges90 bronze badges

8 Comments

Recurfor Over a year ago

I absolutely agree with giving more priority to Xpath, Id and css selector rather than using class name but your code displays the output I got with prices filled in them....meaning I still get a ladder like the image I sent when I actually want just a list of all prices like this [155555, 3000000,4000000]

Recurfor Over a year ago

Could you please share screenshot of your output .. I tried the following code and I got no values with column wise square brackets...you can totally ignore my question if you think its not good enough

Prophet Over a year ago

Your question is totally OK. The problem is that I never run any answer I answered here with Python (several hundreds on the last months) since I have no Python with Selenium installed on my computor :)

Recurfor Over a year ago

I understand and once again appreciate you for being patient towards my comments.

Prophet Over a year ago

Happy I could help you

|

pmadhu · Accepted Answer · 2021-08-16 14:21:13Z

1

Tried with below xpath. And it retrieved the prize.

price_list,nameprice_list = [],[]
houses = driver.find_elements_by_xpath("//div[contains(@class,'table-list')]/a")
for house in houses:
    name = house.find_element_by_tag_name("h2").text
    address = house.find_element_by_xpath(".//p[contains(@class,'opacity-75')]").text
    price = (house.find_element_by_xpath(".//p[contains(@class,'text-xl')]/p").text).replace('Rs. ','')
    price_list.append(price)
    nameprice_list.append((name,price))
    print("{}: {}".format(name,price))

And output:

House on sale at Kapan near Karuna Hospital: Kapan, Budhanilkantha Municipality,1,55,00,000
House on sale at Banasthali: Banasthali, Kathmandu Metropolitan City,7,00,00,000
...
[('House on sale at Kapan near Karuna Hospital', '1,55,00,000'), ('House on sale at Banasthali', '7,00,00,000'), ('Bungalow house for sale at Mandikhatar', '3,80,00,000'), ('Brand new house for sale in Baluwakhani', '3,80,00,000'), ('Bungalow house for sale in Bhangal, Budhanilkantha', '2,90,00,000'), ('Commercial house for sale in Mandikhatar', '2,75,00,000'), ('Attractive house on sale in Budhanilkantha', '5,50,00,000'), ('House on sale at Bafal', '4,50,00,000')]
['1,55,00,000', '7,00,00,000', '3,80,00,000', '3,80,00,000', '2,90,00,000', '2,75,00,000', '5,50,00,000', '4,50,00,000']

edited Aug 16, 2021 at 14:21

answered Aug 16, 2021 at 13:16

pmadhu

3,4332 gold badges14 silver badges24 bronze badges

7 Comments

Recurfor Over a year ago

I think u meant price instead of prize. I also wanted the location, name and price in list .something like this [1,55,00,000 , 7,00,00,000, 3,80, 00, 000] for just price and something like this [house on sale at kapan near karuna hospital, 1,55,00,000] for name and price so that i can make a .csv file later on

pmadhu Over a year ago

@Recurfor - Yes its price. We just need to append those data in a list in for loop. Updated the answer for the same.

Recurfor Over a year ago

I honestly didn't get the output in list...I copied your code and pasted but the output was not in list...

pmadhu Over a year ago

nameprice_list is a list of tuple(name and price). Change that line to nameprice_list.append([name,price]), it will be list of lists.

pmadhu Over a year ago

@Recurfor - Update the question with code where you are trying to add details to the list. We might be able to see whats happening.

|

cruisepandey · Accepted Answer · 2021-08-16 14:48:06Z

1

by first look, only 8 prices are visible, and if you just want to scrape them using selenium

driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://www.nepalhomes.com/list/&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a")
wait = WebDriverWait(driver, 20)
for price in driver.find_elements(By.XPATH, "//p[contains(@class,'leading')]/p[1]"):
    print(price.text.split('.')[1])

this will print all the price, without RS.

answered Aug 16, 2021 at 14:48

cruisepandey

29.5k6 gold badges23 silver badges43 bronze badges

1 Comment

Recurfor Over a year ago

I actually want to scrape data from my homie.csv file... will it help if I attach csv file..yes i deliberately scraped 8 links to test if I can do it with few links before scraping 1k links

Recurfor · Accepted Answer · 2021-08-17 08:50:25Z

This print statement should be outside the for loops to avoid staircase printing of output.

   from selenium import webdriver 
    PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
    driver =webdriver.Chrome(PATH) #preety important part
    driver.get("https://www.nepalhomes.com/list/&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a")
    driver.implicitly_wait(10)
    data_extract= pd.read_csv(r'F:\github projects\homie.csv')  
    de = data_extract['Links'].tolist() 
    data=[] 
    for url in de[0:]:  
        driver.get(url)
        prices = driver.find_elements_by_xpath("//div[@id='app']/div[1]/div[2]/div[1]/div[2]/div/p[1]")
        for price in prices: #after finding xapth get prices 
            data.append(price.text)
    print(data)

Collectives™ on Stack Overflow

How to extract values or data from a list of stored links using selenium python?

5 Answers 5

4 Comments

8 Comments

7 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

8 Comments

7 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related