0

I have list of URLs and i need to scrape data from them. The website refusing connection when opening each url in new driver, so i decided to open each url in new tab(the website allowing this way). Below code i am using

from selenium import webdriver
import time
from lxml import html

driver = webdriver.Chrome()
driver.get('https://www.google.com/')

file = open('f:\\listofurls.txt', 'r')

for aa in file:
    aa = aa.strip()
    driver.execute_script("window.open('{}');".format(aa))
    soup = html.fromstring(driver.page_source)
    name = soup.xpath('//div[@class="name"]//text()')
    title = soup.xpath('//div[@class="title"]//text()')
    print(name, title)
    time.sleep(3)

But the problem is all URLs are opening at a time instead of one after one.

1
  • is opening drivers one by one allowed? I mean opening one driver, then close it and after that opening another one. the website will refuse it or is it ok? Commented Nov 13, 2019 at 12:34

2 Answers 2

1

You can try this code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from lxml import html

driver = webdriver.Chrome()
driver.get('https://www.google.com/')

file = open('f:\\listofurls.txt', 'r')

for aa in file:
    #open tab
    driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 't') 
# You can use (Keys.CONTROL + 't') on other OSs

    # Load a page 
    driver.get(aa)
# Make the tests...
    soup = html.fromstring(driver.page_source)
    name = soup.xpath('//div[@class="name"]//text()')
    title = soup.xpath('//div[@class="title"]//text()')
    print(name, title)
    time.sleep(3)


driver.close()
Sign up to request clarification or add additional context in comments.

1 Comment

Dear Hamza lachi, thank you very much, it is working
0

I think you have to strip before the loop like this:

driver = webdriver.Chrome()
driver.get('https://www.google.com/')

file = open('f:\\listofurls.txt', 'r')
aa = file.strip()

for i in aa:
    driver.execute_script("window.open('{}');".format(i))
    soup = html.fromstring(driver.page_source)
    name = soup.xpath('//div[@class="name"]//text()')
    title = soup.xpath('//div[@class="title"]//text()')
    print(name, title)
    time.sleep(3)

1 Comment

not working, i think 'strip' is not issue here, new tab should be performed after extracting data from current tab.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.