1

I am trying to scrape price of a real estate website namely this one, so I made a list of scraped links and wrote scripts to get prices from all those links. I tried googling and asking around but could not find a decent answer, I just want to get price values from list of links and store it in a way so that it can be converted into a csv file later on with house name, location,price as headers along with respective datas . The output I am getting is: enter image description here.The last list with a lot of prices is what I want. My code is as follows

 from selenium import webdriver 
PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
driver =webdriver.Chrome(PATH) #preety important part
driver.get("https://www.nepalhomes.com/list/&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a")
driver.implicitly_wait(10)
data_extract= pd.read_csv(r'F:\github projects\homie.csv') #reading csv file which contains 8 links 
de = data_extract['Links'].tolist() #converting the csv file to list so that it can be iterated 
data=[] # created an empty list to store extracted prices after the scraping is done from homie.csv
for url in de[0:]: #de has all the links which i want to iterate and scrape prices 
    driver.get(url)
    prices = driver.find_elements_by_xpath("//div[@id='app']/div[1]/div[2]/div[1]/div[2]/div/p[1]")
    for price in prices: #after finding xapth get prices 
        data.append(price.text)
    print(data) # printing in console just to check what kind of data i obtained 

any help will be appreciated. The output I am expecting is something like this [[price of house inside link 0], [price of house inside link 1], similarly]..the links in homie.csv are as follows

Links
https://www.nepalhomes.com/detail/bungalow-house-for-sale-at-mandikhatar
https://www.nepalhomes.com/detail/brand-new-house-for-sale-in-baluwakhani
https://www.nepalhomes.com/detail/bungalow-house-for-sale-in-bhangal-budhanilkantha
https://www.nepalhomes.com/detail/commercial-house-for-sale-in-mandikhatar
https://www.nepalhomes.com/detail/attractive-house-on-sale-in-budhanilkantha
https://www.nepalhomes.com/detail/house-on-sale-at-bafal
https://www.nepalhomes.com/detail/house-on-sale-in-madigaun-sunakothi
https://www.nepalhomes.com/detail/house-on-sale-in-chhaling-bhaktapur

5 Answers 5

2

There is no need to use Selenium to get the data you need. That page loads it's data from an API endpoint.

The API endpoint:

https://www.nepalhomes.com/api/property/public/data?&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a

You can directly make a request to that API endpoint using requests module and get your data.

This code will print all the prices.

import requests

url = 'https://www.nepalhomes.com/api/property/public/data?&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a'

r = requests.get(url)
info = r.json()

for i in info['data']:
    print([i['basic']['title'],i['price']['value']])
['House on sale at Kapan near Karuna Hospital ', 15500000]
['House on sale at Banasthali', 70000000]
['Bungalow house for sale at Mandikhatar', 38000000]
['Brand new house for sale in Baluwakhani', 38000000]
['Bungalow house for sale in Bhangal, Budhanilkantha', 29000000]
['Commercial house for sale in Mandikhatar', 27500000]
['Attractive house on sale in Budhanilkantha', 55000000]
['House on sale at Bafal', 45000000]
Sign up to request clarification or add additional context in comments.

4 Comments

this is very helpful but my aim with this project is to learn selenium and web-scrapping and also I need the price in list like this [1550000, 7000000000, 3800000]
How did you know that data can be extracted from API.. Is there a way to find it out?
Open Chrome Dev tools -> Network tab. Reload the page, you'll see all the requests that this page makes in there.
Ofcourse, you try scraping with selenium (like you did) to learn. I just gave you an easy way to do it.
1

I see several problems here:

  1. I couldn't see no elements matching text-3xl font-bold leading-none text-black class names on the https://www.nepalhomes.com/list/&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a web page
  2. Even if there were such elements - for multiple class names you should use CSS selector or XPath so instead of
find_elements_by_class_name('text-3xl font-bold leading-none text-black')

it should be

find_elements_by_css_selector('.text-3xl.font-bold.leading-none.text-black')
  1. find_elements method returns a list of web elements, so to get texts from these elements you have to iterate over the list and get text from each element, like following:
prices = driver.find_elements_by_css_selector('.text-3xl.font-bold.leading-none.text-black')
for price in prices:
    data.append(price.text)

UPD
With this locator it works correct for me:

prices = driver.find_elements_by_xpath("//p[@class='text-xl leading-none text-black']/p[1]")
for price in prices:
    data.append(price.text)

8 Comments

I absolutely agree with giving more priority to Xpath, Id and css selector rather than using class name but your code displays the output I got with prices filled in them....meaning I still get a ladder like the image I sent when I actually want just a list of all prices like this [155555, 3000000,4000000]
Could you please share screenshot of your output .. I tried the following code and I got no values with column wise square brackets...you can totally ignore my question if you think its not good enough
Your question is totally OK. The problem is that I never run any answer I answered here with Python (several hundreds on the last months) since I have no Python with Selenium installed on my computor :)
I understand and once again appreciate you for being patient towards my comments.
Happy I could help you
|
1

Tried with below xpath. And it retrieved the prize.

price_list,nameprice_list = [],[]
houses = driver.find_elements_by_xpath("//div[contains(@class,'table-list')]/a")
for house in houses:
    name = house.find_element_by_tag_name("h2").text
    address = house.find_element_by_xpath(".//p[contains(@class,'opacity-75')]").text
    price = (house.find_element_by_xpath(".//p[contains(@class,'text-xl')]/p").text).replace('Rs. ','')
    price_list.append(price)
    nameprice_list.append((name,price))
    print("{}: {}".format(name,price))

And output:

House on sale at Kapan near Karuna Hospital: Kapan, Budhanilkantha Municipality,1,55,00,000
House on sale at Banasthali: Banasthali, Kathmandu Metropolitan City,7,00,00,000
...
[('House on sale at Kapan near Karuna Hospital', '1,55,00,000'), ('House on sale at Banasthali', '7,00,00,000'), ('Bungalow house for sale at Mandikhatar', '3,80,00,000'), ('Brand new house for sale in Baluwakhani', '3,80,00,000'), ('Bungalow house for sale in Bhangal, Budhanilkantha', '2,90,00,000'), ('Commercial house for sale in Mandikhatar', '2,75,00,000'), ('Attractive house on sale in Budhanilkantha', '5,50,00,000'), ('House on sale at Bafal', '4,50,00,000')]
['1,55,00,000', '7,00,00,000', '3,80,00,000', '3,80,00,000', '2,90,00,000', '2,75,00,000', '5,50,00,000', '4,50,00,000']

7 Comments

I think u meant price instead of prize. I also wanted the location, name and price in list .something like this [1,55,00,000 , 7,00,00,000, 3,80, 00, 000] for just price and something like this [house on sale at kapan near karuna hospital, 1,55,00,000] for name and price so that i can make a .csv file later on
@Recurfor - Yes its price. We just need to append those data in a list in for loop. Updated the answer for the same.
I honestly didn't get the output in list...I copied your code and pasted but the output was not in list...
nameprice_list is a list of tuple(name and price). Change that line to nameprice_list.append([name,price]), it will be list of lists.
@Recurfor - Update the question with code where you are trying to add details to the list. We might be able to see whats happening.
|
1

by first look, only 8 prices are visible, and if you just want to scrape them using selenium

driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://www.nepalhomes.com/list/&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a")
wait = WebDriverWait(driver, 20)
for price in driver.find_elements(By.XPATH, "//p[contains(@class,'leading')]/p[1]"):
    print(price.text.split('.')[1])

this will print all the price, without RS.

1 Comment

I actually want to scrape data from my homie.csv file... will it help if I attach csv file..yes i deliberately scraped 8 links to test if I can do it with few links before scraping 1k links
0

This print statement should be outside the for loops to avoid staircase printing of output.

   from selenium import webdriver 
    PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
    driver =webdriver.Chrome(PATH) #preety important part
    driver.get("https://www.nepalhomes.com/list/&sort=1&find_property_purpose=5db2bdb42485621618ecdae6&find_property_category=5d660cb27682d03f547a6c4a")
    driver.implicitly_wait(10)
    data_extract= pd.read_csv(r'F:\github projects\homie.csv')  
    de = data_extract['Links'].tolist() 
    data=[] 
    for url in de[0:]:  
        driver.get(url)
        prices = driver.find_elements_by_xpath("//div[@id='app']/div[1]/div[2]/div[1]/div[2]/div/p[1]")
        for price in prices: #after finding xapth get prices 
            data.append(price.text)
    print(data) 

 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.