Scrape a datas using CSS selector (Python, BS4)

Question

I am scraping data using CSS selector for the first time.

And there is a problem scraping content of anchor.

Here is my code:

import requests
from bs4 import BeautifulSoup

url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")

title = post.find("span", {"class": "title"}).get_text()
company = post.find("span", {"class": "company"}).get_text()
location = post.find("span", {"class": "region company"}).get_text()
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")

print {"title": title, "company": company, "location": location, "link":f"https://weworkremotely.com/{link}"}

I want to scrape the content of anchor to make a link of each post. So I put a[href].

But it doesn't work but scrape contents of all subcategory.

What do I have to change to scrape just the content of anchor?

"scrap" is to discard. The term you are looking for is "scrape" as in "screen scrape". — JonSG
– JonSG, Commented Feb 1, 2022 at 1:38

QHarr · Accepted Answer · 2022-02-01 05:56:39Z

1

Assuming you have correctly selected the jobs of interest from all jobs listed, you need a loop, then extract the first href attribute with substring -jobs i.e. post.select_one('[href*=-jobs]' during the loop:

import requests
from bs4 import BeautifulSoup

url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})

for post in posts:
    print('https://weworkremotely.com' + post.select_one('a[href*=-jobs]')['href'])

To get all the listings on the page switch to:

posts = wwr_soup.select('li:has(.tooltip)')

edited Feb 1, 2022 at 5:56

answered Feb 1, 2022 at 5:48

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Seowoo Jang Over a year ago

Thanks for your advice. I have a question. I can get all datas with posts = wwr_soup.find_all("li", {"class": "feature"}) Why do I need to change it??

QHarr Over a year ago

feature doesn't get all the listings. Only the featured.

Seowoo Jang Over a year ago

I understand. Thank you so much.

Collectives™ on Stack Overflow

Scrape a datas using CSS selector (Python, BS4)

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related