Storing list into CSV file from webscraping via selenium?

Question

Warning: new to python and programming

Objective: Scrape all job links from this page and place into a txt/csv/json/XML file: https://www.indeed.ca/jobs?q=title%3Aengineer&l=Vancouver%2C+BC

Code:

from selenium import webdriver
import csv
browser = webdriver.Firefox()
browser.get('https://www.indeed.ca/jobs?q=engineer&l=Vancouver%2C+BC&sort=date')
jobs = browser.find_elements_by_partial_link_text('Engineer')
for job in jobs:
    print(job.get_attribute("href"))
with open("output.csv",'w') as resultFile:
    wr = csv.writer(resultFile)
    wr.writerow(jobs)

It works great when it prints the results, but it doesn't store anything in the csv file. Also, I plan to make this scrape more than 1 page, so what would be the best way in modifying the csv file in a way that expands the links, not overwrites them?

have you tried using bs4.BeautifulSoup(link)

wishmaster
– wishmaster

2018-12-04 03:54:02 +00:00
Commented Dec 4, 2018 at 3:54 — wishmaster
– wishmaster, Commented Dec 4, 2018 at 3:54
havent tried bs4, very new to this. i will give it a shot

Alanna Mueller
– Alanna Mueller

2018-12-04 04:09:14 +00:00
Commented Dec 4, 2018 at 4:09 — Alanna Mueller
– Alanna Mueller, Commented Dec 4, 2018 at 4:09

ewwink · Accepted Answer · 2018-12-04 05:34:13Z

1

it is not writen to csv because the input jobs in wr.writerow(jobs) is not valid, you can do

with open("output.csv",'w') as resultFile:
    wr = csv.writer(resultFile)
    wr.writerow([j.get_attribute("href") for j in jobs])

answered Dec 4, 2018 at 5:34

ewwink

19.3k2 gold badges49 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Red Cricket · Accepted Answer · 2018-12-04 05:25:28Z

0

This is strange looking for jobs in jobs:. Are you sure you didn't mean to write for job in jobs:? And that is probably your problem. You are stomping on your jobs iterable.

Take a look at this example:

>>> numbers = [1,2,3,4]
>>> numbers
[1, 2, 3, 4]
>>> type(numbers)
<type 'list'>
>>> for numbers in numbers:
...     print numbers
...
1
2
3
4
>>> numbers
4
>>> type(numbers)
<type 'int'>

It isn't the print numbers that is turning numbers into an int. Observe:

>>> numbers = [1,2,3,4]
>>> type(numbers)
<class 'list'>
>>> for numbers in numbers:
...    print(":)")
...    
:)
:)
:)
:)
>>> type(numbers)
<class 'int'>
>>> numbers
4

edited Dec 4, 2018 at 5:25

answered Dec 4, 2018 at 3:57

Red Cricket

10.6k24 gold badges96 silver badges192 bronze badges

3 Comments

Alanna Mueller Over a year ago

I tried your suggestion but the terminal is still running. That trick was to get the href attribute from the browser_find_elements_by_partial_link. It seems to print the links just fine, problem comes with storing them.

Alanna Mueller Over a year ago

so what's happening here, the list turned to int after print... thanks for the lead, i will try to make it work and re-read your first comment

Red Cricket Over a year ago

It is not the print that cause numbers to become an int it the for loop.

Collectives™ on Stack Overflow

Storing list into CSV file from webscraping via selenium?

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related