Hey guys I am trying to work with selenium using threads. My code is :-
import threading as th
import time
import base64
import mysql.connector as mysql
import requests
from bs4 import BeautifulSoup
from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options
from functions import *
options = Options()
prefs = {'profile.default_content_setting_values': {'images': 2,'popups': 2, 'geolocation': 2,
'notifications': 2, 'auto_select_certificate': 2, 'fullscreen': 2,
'mouselock': 2, 'mixed_script': 2, 'media_stream': 2,
'media_stream_mic': 2, 'media_stream_camera': 2, 'protocol_handlers': 2,
'ppapi_broker': 2, 'automatic_downloads': 2, 'midi_sysex': 2,
'push_messaging': 2, 'ssl_cert_decisions': 2, 'metro_switch_to_desktop': 2,
'protected_media_identifier': 2, 'app_banner': 2, 'site_engagement': 2,
'durable_storage': 2}}
print('Crawling process started')
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(executable_path='chromedriver.exe', options=options)
driver.set_page_load_timeout(50000)
urls='https://google.com https://youtube.com'
def getinf(url_):
driver.get(url_)
soup=BeautifulSoup(driver.page_source, 'html5lib')
print(soup.select('title'))
for url in urls.split():
t=th.Thread(target=getinf, args=(url,))
t.start()
When the script run the tabs are not opened at once as I expected(from threads) instead the process is done one by one and the title of last url(https://youtube.com) is only printed. when I try Multiprocessing , program crashes many times. I am making a web crawler and some websites(like twitter) requires JavaScript for showing content, so I can't use requests or urllib as well. What can be the solution for this. Any other library suggestion will be welcomed.