0

Warning: new to python and programming

Objective: Scrape all job links from this page and place into a txt/csv/json/XML file: https://www.indeed.ca/jobs?q=title%3Aengineer&l=Vancouver%2C+BC

Code:

from selenium import webdriver
import csv
browser = webdriver.Firefox()
browser.get('https://www.indeed.ca/jobs?q=engineer&l=Vancouver%2C+BC&sort=date')
jobs = browser.find_elements_by_partial_link_text('Engineer')
for job in jobs:
    print(job.get_attribute("href"))
with open("output.csv",'w') as resultFile:
    wr = csv.writer(resultFile)
    wr.writerow(jobs)

It works great when it prints the results, but it doesn't store anything in the csv file. Also, I plan to make this scrape more than 1 page, so what would be the best way in modifying the csv file in a way that expands the links, not overwrites them?

2
  • have you tried using bs4.BeautifulSoup(link) Commented Dec 4, 2018 at 3:54
  • havent tried bs4, very new to this. i will give it a shot Commented Dec 4, 2018 at 4:09

2 Answers 2

1

it is not writen to csv because the input jobs in wr.writerow(jobs) is not valid, you can do

with open("output.csv",'w') as resultFile:
    wr = csv.writer(resultFile)
    wr.writerow([j.get_attribute("href") for j in jobs])
Sign up to request clarification or add additional context in comments.

Comments

0

This is strange looking for jobs in jobs:. Are you sure you didn't mean to write for job in jobs:? And that is probably your problem. You are stomping on your jobs iterable.

Take a look at this example:

>>> numbers = [1,2,3,4]
>>> numbers
[1, 2, 3, 4]
>>> type(numbers)
<type 'list'>
>>> for numbers in numbers:
...     print numbers
...
1
2
3
4
>>> numbers
4
>>> type(numbers)
<type 'int'>

It isn't the print numbers that is turning numbers into an int. Observe:

>>> numbers = [1,2,3,4]
>>> type(numbers)
<class 'list'>
>>> for numbers in numbers:
...    print(":)")
...    
:)
:)
:)
:)
>>> type(numbers)
<class 'int'>
>>> numbers
4

3 Comments

I tried your suggestion but the terminal is still running. That trick was to get the href attribute from the browser_find_elements_by_partial_link. It seems to print the links just fine, problem comes with storing them.
so what's happening here, the list turned to int after print... thanks for the lead, i will try to make it work and re-read your first comment
It is not the print that cause numbers to become an int it the for loop.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.