Python Selenium - Get Link from Within a Class

Question

I am attempting to scrape the href from the following HTML, but I need the second data class to identify the href:

<tr>
<td class="data">
    <a target="_new" title="Title" href="https://somesite.com/file_to_scrape.pdf">Scraped Class</a>
<br>
</td>
<td class="data">Text to Identify Above Link</td>
<td class="data">Not relevant text</td>
</tr>

The first thing I do is pull back a list of all classes that are named data:

ls_class = driver.find_elements_by_class_name("data")

but when I loop through:

for clas in ls_class:
   print(clas.text)
   print(clas.get_attribute('href'))

The print out is:

Scraped Class
None
Text to Identify Above Link
None
Not Relevant Text
None

How can I get the nested href when present in a data class?

Prophet · Accepted Answer · 2021-08-12 20:52:31Z

1

Instead of getting

ls_class = driver.find_elements_by_class_name("data")

You can get directly

elements = driver.find_elements_by_xpath("//td[@class='data']//a")
for element in elements:
   print(element.text)
   print(element.get_attribute('href'))

UPD
I think you can get the desired element directly by this code:

element = driver.find_elements_by_xpath("//tr[.//td[@class='data'][text()='Text to Identify Above Link']//td[@class='data']//a[@href]")
print(element.get_attribute('href'))

edited Aug 12, 2021 at 20:52

answered Aug 12, 2021 at 20:14

Prophet

33.5k28 gold badges58 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

EliSquared Over a year ago

When I do this, it doesnt return the second class which has text I need to identify the prior link. I only get classes with href.

Prophet Over a year ago

Moment, maybe I misunderstood you. is all you want is to get the "https://somesite.com/file_to_scrape.pdf value (this value is unknown, we have to get it) while this a element is inside some td element so that the next sibling of this td is td with known text Text to Identify Above Link? Correct?

EliSquared Over a year ago

Yes, I need to get both the href from the first class and the text from the second class of Text to Identify Above Link, otherwise I dont know what the link is for. I tried find_elements_by_xpath("//td[@class='data']") but that just gets me the same output as what I originally had.

Prophet Over a year ago

In the example HTML you provided the second td with the Text to Identify Above Link text has no a with href inside it.

EliSquared Over a year ago

Because there is no href in the second data class. I need to get the first and second data class consecutively in a list of elements and then when I have identified the link from the text in the second data class, I want to extract the href from the first data class, if that makes sense.

|

EliSquared · Accepted Answer · 2021-08-16 12:52:37Z

0

I got it to work using a solution posted here:

 ls_class = driver.find_elements_by_xpath("//td[@class='data']")

 for clas in ls_class:
     print(clas.text)
     try:
         print(clas.find_element_by_css_selector('a').get_attribute('href'))
     except:
         print("No Link")

Now my output is:

Scraped Class
https://somesite.com/file_to_scrape.pdf
Text to Identify Above Link
No Link
Not Relevant Text
No Link

edited Aug 16, 2021 at 12:52

answered Aug 12, 2021 at 21:24

EliSquared

1,5816 gold badges24 silver badges49 bronze badges

Collectives™ on Stack Overflow

Python Selenium - Get Link from Within a Class

2 Answers 2

13 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

13 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related