0

I need to capture a list of element: "TEXT TO CAPTURE 1", "TEXT TO CAPTURE 2", ... scraping a web page with selenium and python. The HTML of the page is the following:

<div class="contenedor" style="overflow:auto; padding: 6px;">
    <div style="width: 75px;">
        <p class="line1">
            <a href="http://www.somelink1.com/"><img src="https://www.somelink2.com" class="yborder" alt="Name"></a>
        </p>
        <p class="line1" style="align: center;">              
            <a href="www.somelink3.com" class="gensmall">TEXT TO CAPTURE 1</a>
        </p>
    </div>
    <div style="width: 75px;">
        <p class="line1">
            <a href="www.somelink4.com"><img src="hwww.somelink5.com" class="yborder" alt="Dana Vespoli"></a>
        </p>
        <p class="line1" style="align: center;">              
            <a href="www.somelink6.com" class="gensmall">TEXT TO CAPTURE 2</a>
        </p>
    </div>

    ... others numbers of same <div> fields    ....

</div>

The number of element are change day by day I open the page so the number of the elements are undefined.

I can get only the first element with this:

driver.find_element_by_xpath("//p[contains(@class, 'line1')]/following::a")

Thanks for help

2 Answers 2

1

Instead of using the find_element_by_xpath method, go for the find_elements method which will give you multiple elements.

Also, instead of Xpath, you can use the class="gensmall"> to get the text (if this class is present in all the a tag.

Check this out

list_of_elements = driver.find_elements_by_css_selector('a. gensmall')
for i in len(list_of_elements):
    print (i.text)

Let me know if this works.

Sign up to request clarification or add additional context in comments.

1 Comment

find_elements_by_class_name('gensmall')
1

To extract the texts e.g. TEXT TO CAPTURE 1, TEXT TO CAPTURE 2, etc you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following solutions:

  • Using CSS_SELECTOR:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.contenedor p.line1>a.gensmall")))])
    
  • Using XPATH:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='contenedor']//p[@class='line1']/a[@class='gensmall']")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.