3

I am scraping a website using selenium in python. The xpath is able to find the 20 elements, which contain the search results. However, the content is available only for the first 6 elements, and the rest has empty strings. This is true for all the pages of the results

The xpath used:

results = driver.find_elements_by_xpath("//li[contains(@class, 'search-result search-result__occluded-item ember-view')]")

xpath finds 20 elements in chrome

enter image description here

Text inside the results

[tt.text for tt in results]

anonymized output:

['Abcddwedwada',
 'Asefdasdfaca',
 'Asdaafcascac',
 'Asdadaacjkhi',
 'Sfskjfbsfvbkd',
 'Fjsbfksjnsvas',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '']

I have tried extracting the id of the 20 elements and used driver.find_element_by_id, but still I get empty strings after the first 6 elements.

2

2 Answers 2

1

Try this ,

[str(tt.text) for tt in results if str(tt.text) !='']

OR

 [tt.text for tt in results if len(tt.text) > 0]
Sign up to request clarification or add additional context in comments.

3 Comments

This filters out the results with empty strings
@mrbot what is the type of empty string ' ' ? unicode or string?
type of the empty string is str
1

I can assume that the reason of such result is following: when you opens the page there are 20 entries (<li> elements in <ul>), but only content of 6 displayed. Content of other elements could be displayed after scrolling down - content of those 14 entries generated dynamically from XHR requests.

So you might need to perform scrolling down to the last element in list:

from selenium.webdriver.support.ui import WebDriverWait as wait 

wait(driver, 10).until(lambda x: len(driver.find_elements_by_xpath("//li[contains(@class, 'search-result search-result__occluded-item ember-view') and not(text()='')]")) == 20)
results = driver.find_elements_by_xpath("//li[contains(@class, 'search-result search-result__occluded-item ember-view')]")
results[-1].location_once_scrolled_into_view
[tt.text for tt in results]

Try and let me know results

5 Comments

It didn't work. I thought of that and tried: driver.execute_script("window.scrollTo(0, Y);")
Does anything have to do with using pyvirtualdisplay?
all the 20 elements return True for is_displayed()
Try code from updated answer and let me know if it still doesn't work as expected
the new wait statement returned True, and the results still have empty strings after 6th result

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.