Python Selenium only getting first row when iterating over table

Question

I am trying to extract the most recent headlines from the following news site: http://news.sina.com.cn/hotnews/

#save ids of relevant buttons that need to be clicked on the site
buttons_ids = ['Tab21' , 'Tab22', 'Tab32']

#save ids of relevant subsections
con_ids = ['Con11']

#start webdriver, go to site, hover over buttons
driver = webdriver.Chrome()
driver.get("http://news.sina.com.cn/hotnews/")
time.sleep(3)
for button_id in buttons_ids:
    button = driver.find_element_by_id(button_id)
    ActionChains(driver).move_to_element(button).perform()

Then I iterate through each section that I am interested in and within each section through all the headlines which are rows in an HTML table. However, on every iteration, it returns the first element

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        headline = driver.find_element_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]")
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

I also tried the following approach by essentially saving the table as a list and then iterating through the rows:

for con_id in con_ids:
    table = driver.find_elements_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr")
    for headline in table:
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

In the second case I get exactly the number of headlines in the section, so it apparently correctly picks up the number of rows. However, it is still only returning the first row on all iterations. Where am I going wrong? I know a similar question has been asked here: Selenium Python iterate over a table of rows it is stopping at the first row but I am still unable to figure out where I am going wrong.

Ian Lesperance · Accepted Answer · 2018-02-16 04:52:14Z

3

In XPath, queries that begin with // will search relative to the document root; so even though you're calling find_element_by_xpath() on the correct container element, you're breaking out of that scope, thereby performing the same global search and yielding the same result every time.

To constrain your query to descendants of the current element, begin your query with .//, e.g.,:

text = headline.find_element_by_xpath(".//td[2]/a")

answered Feb 16, 2018 at 4:52

Ian Lesperance

5,1591 gold badge30 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sebastian Over a year ago

Thanks, Ian, indeed it works if I begin the query like this. I am making this the accepted answer due to the explanation. But Pradeep's updated code works as well.

Ian Lesperance Over a year ago

That's because he updated it to include a . at the beginning of the query. 😉

Pradeep hebbar · Accepted Answer · 2018-02-16 05:26:16Z

1

try this:

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        print("(//div[@id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
        headline = driver.find_element_by_xpath("(//div[@id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
        value = headline.find_element_by_xpath(".//td[2]/a")
        print(value.get_attribute("innerText").encode('utf-8'))

I am able to get the headlines with above code

edited Feb 16, 2018 at 5:26

answered Feb 15, 2018 at 18:18

Pradeep hebbar

2,2771 gold badge11 silver badges14 bronze badges

3 Comments

Sebastian Over a year ago

Thanks for the suggestion. You said it worked for you? Did you get 10 different headlines? Because unfortunately, when I run it your code produces exactly the same as mine. It prints the first headline 10 times. Somehow it always selects the first row even when I explicitly pass it the index of another one.

Pradeep hebbar Over a year ago

@Sebastian i have edited my answer , can you try now

Pradeep hebbar Over a year ago

@Sebastian I am able to get all 10 headlines with the above updated code , have a look at it once.

Sebastian · Accepted Answer · 2018-02-15 20:49:03Z

0

I was able to solve it by specifying the entire XPath in one go like this:

headline = driver.find_element_by_xpath("(//*[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]/td[2]/a)")
print(headline.get_attribute("innerText"))
print(headline.get_attribute("href"))

rather than splitting it into two parts. My only explanation for why it only prints the first row repeatedly is that there is some weird Javascript at work that doesn't let you iterate properly when splitting the request. Or my first version had a syntax error, which I am not aware of. If anyone has a better explanation, I'd be glad to hear it!

answered Feb 15, 2018 at 20:49

Sebastian

4657 silver badges21 bronze badges

Collectives™ on Stack Overflow

Python Selenium only getting first row when iterating over table

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related