Problem with Web Scraping to CSV [AttributeError: 'str' object has no attribute 'text]

Question

I am trying to build an automated web scraper, and I have spent hours watching YT videos and reading stuff here. New to programming (started one month ago) and new to this community...

So, using VScode as my IDE, I followed the format of this code (python and selenium) that actually worked as a web scraper:

from selenium import webdriver
import time
from selenium.webdriver.support.select import Select

with open('job_scraping_multipe_pages.csv', 'w') as file:
    file.write("Job_title, Location, Salary, Contract_type, Job_description \n")
    
driver= webdriver.Chrome()
driver.get('https://www.jobsite.co.uk/')

driver.maximize_window()
time.sleep(1)

cookie= driver.find_element_by_xpath('//button[@class="accept-button-new"]')
try:
    cookie.click()
except:
    pass 

job_title=driver.find_element_by_id('keywords')
job_title.click()
job_title.send_keys('Software Engineer')
time.sleep(1)

location=driver.find_element_by_id('location')
location.click()
location.send_keys('Manchester')
time.sleep(1)

dropdown=driver.find_element_by_id('Radius')
radius=Select(dropdown)
radius.select_by_visible_text('30 miles')
time.sleep(1)

search=driver.find_element_by_xpath('//input[@value="Search"]')
search.click()
time.sleep(2)

for k in range(3):
    titles=driver.find_elements_by_xpath('//div[@class="job-title"]/a/h2')
    location=driver.find_elements_by_xpath('//li[@class="location"]/span')
    salary=driver.find_elements_by_xpath('//li[@title="salary"]')
    contract_type=driver.find_elements_by_xpath('//li[@class="job-type"]/span')
    job_details=driver.find_elements_by_xpath('//div[@title="job details"]/p')

    with open('job_scraping_multipe_pages.csv', 'a') as file:
        for i in range(len(titles)):
            file.write(titles[i].text + "," + location[i].text + "," + salary[i].text + "," + contract_type[i].text + ","+
                      job_details[i].text + "\n")

        
        next=driver.find_element_by_xpath('//a[@aria-label="Next"]')
        next.click()
    file.close()
driver.close()

It worked. I then tried to replicate the results for another website. Instead of hitting the 'next' button, I was able to find a way to cause the ending number of the URL increase by 1. But my problems came from the last parts of the code, giving me AttributeError: 'str' object has no attribute 'text'. Here is the code for the website I was targeting (https://angelmatch.io/pitch_decks/5285) in Python and Selenium:

from selenium import webdriver
import time
from selenium.webdriver.support.select import Select

driver = webdriver.Chrome()


with open('pitchDeckResults2.csv', 'w' ) as file:
    file.write("Startup_Name, Startup_Description, Link_Deck_URL, Startup_Website, Pitch_Deck_PDF, Industries, Amount_Raised, Funding_Round, Year /n")




    for k in range(5285, 5287, 1):
        
        linkDeck = "https://angelmatch.io/pitch_decks/" + str(k)        

        driver.get(linkDeck)
        driver.maximize_window
        time.sleep(2)

        startupName = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[1]')
        startupDescription = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[3]/p[2]')
        startupWebsite = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[3]/a')
        pitchDeckPDF = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/button/a')
        industries = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[2]')
        amountRaised = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[1]/b')
        fundingRound = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[1]')
        year = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[2]/b')

        

        with open('pitchDeckResults2.csv', 'a') as file:
            for i in range(len(startupName)):
                file.write(startupName[i].text + "," + startupDescription[i].text + "," + linkDeck[i].text + "," + startupWebsite[i].text + "," + pitchDeckPDF[i].text + "," + industries[i].text + "," + amountRaised[i].text + "," + fundingRound[i].text + "," + year[i].text +"\n")

            time.sleep(1)

        file.close()

driver.close()

I'll appreciate any help! I am trying to get the data into CSV using this technique!

I have updated code, check it out

Vova
– Vova

2021-03-17 18:17:20 +00:00
Commented Mar 17, 2021 at 18:17 — Vova
– Vova, Commented Mar 17, 2021 at 18:17

Vova · Accepted Answer · 2021-03-17 18:17:01Z

1

And you're doing great, honestly. The only thing and why error appears, you're trying to get .text variable from string type value. str type in python doesn't have any text variable. Moreover you're trying to iterate it by [i] what can reach 'list index out of range.' exception. What you're trying to put on the place of linkDeck[i].text, might be page.title?or what?

By the way, you shouldn't close file when you use with open() statement. It's context manager, which makes it without you after you leave file out

add added columns to maxamize_window() and remove 1 file opening, and added just link:

import time

from selenium import webdriver

driver = webdriver.Chrome()
delimeter = ';'
with open('pitchDeckResults2.csv', 'w+') as _file:
    _l = ['Startup_Name', 'Startup_Description', 'Link_Deck_URL', 'Startup_Website', 'Pitch_Deck_PDF', 'Industries',
          'Amount_Raised', 'Funding_Round', 'Year \n']
    _file.write(delimeter.join(_l))
    for k in range(5285, 5287, 1):
        linkDeck = "https://angelmatch.io/pitch_decks/" + str(k)

        driver.get(linkDeck)
        time.sleep(1)

        startupName = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[1]')
        startupDescription = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[3]/p[2]')
        startupWebsite = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[3]/a')
        pitchDeckPDF = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/button/a')
        industries = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[2]')
        amountRaised = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[1]/b')
        fundingRound = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/a[1]')
        year = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[3]/div[1]/div/p[2]/b')

        all_elements = [startupName.text, startupDescription.text, linkDeck, startupWebsite.text, pitchDeckPDF.text,
                        industries.text, amountRaised.text, fundingRound.text, f"{year.text}\n"]
        _str = delimeter.join(all_elements)
        _file.write(_str)

driver.close()

Might I have missed smth, let me know

edited Mar 17, 2021 at 18:17

answered Mar 15, 2021 at 7:25

Vova

3,6173 gold badges19 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

JGNT Over a year ago

I am trying to create a CSV file with nine columns labeled "Startup_Name, Startup_Description, Link_Deck_URL, Startup_Website, Pitch_Deck_PDF, Industries, Amount_Raised, Funding_Round, Year"

JGNT Over a year ago

I am trying to create a CSV file with nine columns, each titled "Startup_Name, Startup_Description ... Funding_Round, Year." What boggles me is that the original code I am trying to replicate works. As you noted, it may be that startupName = driver.find_elements_by_xpath('/html/body/div[1]/div[2]/div[2]/div/div/div[1]') ... does not actually get a text element from the website I am scraping ... linkDeck in file.write stands for the page URL that increases by 1 every next page. the file.write section represents the data I am scraping under their respective titles... Appreciate it...

Vova Over a year ago

@JGNT you made 98%, I just added tricky things, if I'm not mistaken w+ just added opportunity to create file, if it's not exists. I would suggest using more relative selectors, and also use smth like ",".join(list with your elements). you should add + "," +

Vova Over a year ago

@JGNT where do you use Excel, sorry?

Vova Over a year ago

@JGNT you can set delimeter on the step of writing data, when you read it, you just split and read by lines

|

Collectives™ on Stack Overflow

Problem with Web Scraping to CSV [AttributeError: 'str' object has no attribute 'text]

1 Answer 1

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related