0

I have an html file with links and would like to distinguish if a link is an image or if a link is text. I am using Python 3 and Selenium with Firefox or PhantomJS browser. The goal is to have an automatic procedure to go through several hundred html files and find links with images.

What I do: I first focus only on one html file. In a first step I get the names of all images in the html file, where I know that one image is a link and the others are not. I call the images IMAGENAME.

Then I try to derive if one or more of the images is a link. I use:

driver.find_element_by_xpath('(//a[/img[contains(text(),"'+IMAGENAME[j]+'")]])')
or 
driver.find_element_by_xpath('//a[/img[contains(@src,IMAGENAME[j])]]') 
or 
driver.find_element_by_xpath('//a/img[contains(@src,IMAGENAME[j])]') 
with j = 0 (image is not a link) and j = 1 (image is a link).

In each case I get the same error message, which tells me that the way I call the image must be wrong:

NoSuchElementException: Message: no such element: Unable to locate element:...

When I leave the //a part out and use only //img, then I get all images without error message.

What am I doing wrong in calling the image in a link? Is there another way I can do this?

2
  • I'm not sure I can correctly understand your goal, but //a[img] will return you links with images and //a[not(img)]- links with text. Note that if you want to use j variable in XPath you should use 'xpath expression with %s variable' % j, but not 'xpath expression with j variable' Commented Jun 13, 2017 at 14:58
  • If I remember correctly, \\a stands for anchor, whereas you might be looking for href? Commented Jun 13, 2017 at 15:01

1 Answer 1

1

If you are just looking for the src attribute for all links that contain images, you could use something like:

imageLinks = driver.find_elements_by_xpath("//a//img")
imageNames = []
for element in imageLinks
    imageNames.add(element.get_attribute("src"))
Sign up to request clarification or add additional context in comments.

3 Comments

Hi! Thank you! That's great! I added ':' after the for loop and used 'append' instead of 'add'.
Right on! Glad it worked. Sorry about the 'add'. I'm not as familiar with python as I should be.
still working in 2022 :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.