1

I'm trying to extract text in the following HTML inside the class="a-size-based-plus a-color-base" using selenium webdriver.

Scraping the text inside the blue line

My code structure is the following:

from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.expected_conditions import presence_of_element_located

import os
import re  # regular expressions, are imported from python directly
import time
import numpy as np
import pandas as pd
from difflib import SequenceMatcher
BASE_DIR = os.path.dirname(os.path.abspath(__file__))

-----HERE is some unrelated code-----

# Find Data
    i = 0
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    wait = WebDriverWait(driver, 20)
    wait.until(EC.element_to_be_clickable(
        (By.CLASS_NAME, 'xtaqv-root')))
    wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'extension-rank')))
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[data-src="price"]')))
    time.sleep(5)

    for element in driver.find_elements_by_class_name('xtaqv-root'):   
        # Ratio of similarity
        try:
            item_name = element.find_element_by_tag_name("h2").text
            ratio = SequenceMatcher(None, item_name, key).ratio()
        except:
            item_name = np.nan
            ratio = 0
            pass
        try:
            link = element.find_element_by_css_selector('[data-src="price"]')
            href = link.get_attribute('href')
        except:         
            href = np.nan
        try:
            brand = element.find_element_by_css_selector('.a-size-based-plus.a-color-base')
            brand = brand.text
        except:         
            brand = np.nan  

The last try-except in the code is the most important.

1
  • You haven't said what issue/error you are facing. Commented Feb 26, 2020 at 12:06

1 Answer 1

3

From looking at the HTML, I see a typo in your locator, this line:

brand = element.find_element_by_css_selector('.a-size-based-plus.a-color-base')

It should be size-base not size-based, try this:

brand = element.find_element_by_css_selector('.a-size-base-plus.a-color-base')

Hope, this helps.

Sign up to request clarification or add additional context in comments.

2 Comments

Haha, thank you @DebanjanB :) I hope it was the issue.
It worked!! I did a little amendment too brand = element.find_element_by_css_selector('.a-size-base-plus.a-color-base').text

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.