1

Here is the link for which I want to extract a button link text, but I'm unable to do it so After the website opens, I'm selecting an option from a "Choose a Product" , suppose I choose first option i.e "Acrylic Coatings", then 3 types appears, which is "Primers", "Intermediates", "Finishes", I want to extract their text which I'm unable to do.

import requests
from bs4 import BeautifulSoup
driver = webdriver.Chrome('~/chromedriver.exe')

driver.get('http://www.asianpaintsppg.com/applications/protective_products.aspx')
lst_name = ['Acrylic Coatings','Glass Flake Coatings']

for i in lst_name:
    print(i)
    driver.find_element_by_xpath("//select[@name='txtProduct']/option[text()="+"'"+str(i)+"'"+"]").click()
    page = requests.get("http://www.asianpaintsppg.com/applications/protective_products.aspx")
    soup = BeautifulSoup(page.content, 'html.parser')
    for div in soup.findAll('table', attrs={'id':'dataLstSubCat'}):
      print(div.find('a')['href'])

But I get empty values here. Any help would be appreciated.

1

3 Answers 3

2

There are options to get the subcategories without using selenium. Try using post requests like I've shown below.

import requests
from bs4 import BeautifulSoup

url = "http://www.asianpaintsppg.com/applications/protective_products.aspx"

with requests.Session() as s:
    r = s.get(url)
    soup = BeautifulSoup(r.text,"lxml")
    payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
    payload['txtProduct'] = '2' #This is the dropdown number
    res = s.post(url,data=payload)
    sauce = BeautifulSoup(res.text,"lxml")
    subcat = [item.text for item in sauce.select("[id^='dataLstSubCat_']")]
    print(subcat)

Output you may get:

['Primers', 'Intermediates', 'Finishes']
Sign up to request clarification or add additional context in comments.

2 Comments

Much better there :-)
@SIM if you could please explain me your code, its functioning it would be appreciated. Thanks..
1

You want .text not href and also a wait condition to allow page to update:

#dataLstSubCat a

Then extract .text in loop|comprehension

items = [item.text for item in soup.select('#dataLstSubCat a')]

You can do whole thing with selenium - you need a wait condition to ensure content present and an additional wait condition for the text to change after iteration 1. I use time.sleep which is suboptimal.

items = [item.text for item in  WebDriverWait(driver,5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#dataLstSubCat a")))]

Additional imports:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

You could probably do the whole thing with POST requests, and an initial GET, as it looks like the page uses __doPostBack (.aspx) where the value from the dropdown above is used to return the subitems.


from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time

driver = webdriver.Chrome() #'~/chromedriver.exe')
driver.get('http://www.asianpaintsppg.com/applications/protective_products.aspx')

lst_name = ['Acrylic Coatings','Glass Flake Coatings']

for i in lst_name:
    driver.find_element_by_xpath("//select[@name='txtProduct']/option[text()="+"'"+str(i)+"'"+"]").click()
    items = [item.text for item in  WebDriverWait(driver,5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#dataLstSubCat a")))]
    print(items)
    time.sleep(2)

8 Comments

it gives [] / empty list
Did you use the wait condition?
No, Have not used any wait condition, Rather how do I implement this in my code ?
try that first as shown above as you may be trying to access before DOM has been updated.
Your code gives this output " Acrylic Coatings ['Primers', 'Intermediates', 'Finishes'] Glass Flake Coatings ['Primers', 'Intermediates', 'Finishes']" for "Acrylic Coatings" its right but for "Glass Flake Coatings" its not giving proper output
|
0

Use the following code.It gives me following output.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

driver = webdriver.Chrome('~/chromedriver.exe')
driver.get('http://www.asianpaintsppg.com/applications/protective_products.aspx')
lst_name = ['Acrylic Coatings','Glass Flake Coatings']

for i in lst_name:

    driver.find_element_by_xpath("//select[@name='txtProduct']/option[text()="+"'"+str(i)+"'"+"]").click()
    elements=WebDriverWait(driver, 10).until(expected_conditions.presence_of_all_elements_located((By.XPATH, '//table[@id="dataLstSubCat"]//tr//td//a[starts-with(@id,"dataLstSubCat_LnkBtnSubCat_")]')))
    for ele in elements:
        print(ele.text)

4 Comments

This is not I'm expecting as output, please refer the question
@deepesh : sorry for that.Just try the updated code.
This give O/P as "Primers Intermediates Finishes" for 1st item from list and also"Primers Intermediates Finishes" for 2nd item in the lst_name, which actually is not the case
this is because it is in loop if you see your code you are selecting drop down value each time and it gives the results.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.