0

I am trying to get selenium to web scrape the first paragraph of wiki pages using CSS selectors.

When I run this code, it seems to only select ones from the original web page

https://en.wikipedia.org

and not what I am searching for, in this case 'cats'.

Any help with this would be awesome!


browser = webdriver.Firefox(executable_path='D:\Import Files that I also want backed up\Jupyter Notebooks\Python Projects\Selenium\driverss\geckodriver.exe')
browser.get('https://en.wikipedia.org')

search_elem = browser.find_element_by_css_selector('#searchInput')

search_elem.send_keys('cats')
search_elem.submit()


results_elem = browser.find_element_by_css_selector('p')

print(results_elem.text)

output:

Adventure Time is an American fantasy animated television series created .....

2
  • what is your expected output? Commented Apr 5, 2020 at 20:25
  • I want to print the first paragraph of the 'cat' page. But when using the css selectors I am still only scraping off the first 'wikipedia.com' page. Even though I am on the 'cat' page. Essentially I want to be able to scrape from a web page after searching a topic using selenium. Commented Apr 5, 2020 at 21:03

1 Answer 1

1

To get the first paragraph text from wiki page.Induce WebDriverWait() and visibility_of_element_located() and following css selector.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

browser = webdriver.Firefox(executable_path='D:\Import Files that I also want backed up\Jupyter Notebooks\Python Projects\Selenium\driverss\geckodriver.exe')
browser.get('https://en.wikipedia.org')
search_elem = browser.find_element_by_css_selector('#searchInput')
search_elem.send_keys('cats')
search_elem.submit()
results_elem=WebDriverWait(browser,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"div.mw-parser-output p:nth-of-type(3)")))
print(results_elem.text)
Sign up to request clarification or add additional context in comments.

2 Comments

see this works if I eliminate the code using the search bar. But when I get to the cat page by searching for cats it takes that css selector from the first page visited 'en.wikipedia.org'
If your page is taking more time to load.then provide some time.sleep(5) after submit the page.let me know how this goes?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.