I am trying to build an automated web scraper, and I have spent hours watching YT videos and reading stuff here. New to programming (started one month ago) and new to this community...
So, using VScode as my IDE, I followed the format of this code (python and selenium) that actually worked as a web scraper:
from selenium import webdriver
import time
from selenium.webdriver.support.select import Select
with open('job_scraping_multipe_pages.csv', 'w') as file:
file.write("Job_title, Location, Salary, Contract_type, Job_description \n")
driver= webdriver.Chrome()
driver.get('https://www.jobsite.co.uk/')
driver.maximize_window()
time.sleep(1)
cookie= driver.find_element_by_xpath('//button[@class="accept-button-new"]')
try:
cookie.click()
except:
pass
job_title=driver.find_element_by_id('keywords')
job_title.click()
job_title.send_keys('Software Engineer')
time.sleep(1)
location=driver.find_element_by_id('location')
location.click()
location.send_keys('Manchester')
time.sleep(1)
dropdown=driver.find_element_by_id('Radius')
radius=Select(dropdown)
radius.select_by_visible_text('30 miles')
time.sleep(1)
search=driver.find_element_by_xpath('//input[@value="Search"]')
search.click()
time.sleep(2)
for k in range(3):
titles=driver.find_elements_by_xpath('//div[@class="job-title"]/a/h2')
location=driver.find_elements_by_xpath('//li[@class="location"]/span')
salary=driver.find_elements_by_xpath('//li[@title="salary"]')
contract_type=driver.find_elements_by_xpath('//li[@class="job-type"]/span')
job_details=driver.find_elements_by_xpath('//div[@title="job details"]/p')
with open('job_scraping_multipe_pages.csv', 'a') as file:
for i in range(len(titles)):
file.write(titles[i].text + "," + location[i].text + "," + salary[i].text + "," + contract_type[i].text + ","+
job_details[i].text + "\n")
next=driver.find_element_by_xpath('//a[@aria-label="Next"]')
next.click()
file.close()
driver.close()
It worked. I then tried to replicate the results for another website. Instead of hitting the 'next' button, I was able to find a way to cause the ending number of the URL increase by 1. But my problems came from the last parts of the code, giving me AttributeError: 'str' object has no attribute 'text'. Here is the code for the website I was targeting (https://angelmatch.io/pitch_decks/5285) in Python and Selenium:
from selenium import webdriver
import time
from selenium.webdriver.support.select import Select
driver = webdriver.Chrome()
with open('pitchDeckResults2.csv', 'w' ) as file:
file.write("Startup_Name, Startup_Description, Link_Deck_URL, Startup_Website, Pitch_Deck_PDF, Industries, Amount_Raised, Funding_Round, Year /n")
for k in range(5285, 5287, 1):
linkDeck = "https://angelmatch.io/pitch_decks/" + str(k)
driver.get(linkDeck)
driver.maximize_window
time.sleep(2)
startupName = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[1]')
startupDescription = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[3]/p[2]')
startupWebsite = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[3]/a')
pitchDeckPDF = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/button/a')
industries = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[2]')
amountRaised = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[1]/b')
fundingRound = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[1]')
year = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[2]/b')
with open('pitchDeckResults2.csv', 'a') as file:
for i in range(len(startupName)):
file.write(startupName[i].text + "," + startupDescription[i].text + "," + linkDeck[i].text + "," + startupWebsite[i].text + "," + pitchDeckPDF[i].text + "," + industries[i].text + "," + amountRaised[i].text + "," + fundingRound[i].text + "," + year[i].text +"\n")
time.sleep(1)
file.close()
driver.close()
I'll appreciate any help! I am trying to get the data into CSV using this technique!