0

I am scraping data using CSS selector for the first time.

And there is a problem scraping content of anchor.

Here is my code:

import requests
from bs4 import BeautifulSoup

url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")

title = post.find("span", {"class": "title"}).get_text()
company = post.find("span", {"class": "company"}).get_text()
location = post.find("span", {"class": "region company"}).get_text()
link = post.select("#category-2 > article > ul > li:nth-child(1) > a[href]")

print {"title": title, "company": company, "location": location, "link":f"https://weworkremotely.com/{link}"}

I want to scrape the content of anchor to make a link of each post. So I put a[href].

But it doesn't work but scrape contents of all subcategory.

What do I have to change to scrape just the content of anchor?

1
  • "scrap" is to discard. The term you are looking for is "scrape" as in "screen scrape". Commented Feb 1, 2022 at 1:38

1 Answer 1

1

Assuming you have correctly selected the jobs of interest from all jobs listed, you need a loop, then extract the first href attribute with substring -jobs i.e. post.select_one('[href*=-jobs]' during the loop:

import requests
from bs4 import BeautifulSoup

url = "https://weworkremotely.com/remote-jobs/search?utf8=✓&term=ruby"
wwr_result = requests.get(url)
wwr_soup = BeautifulSoup(wwr_result.text, "html.parser")
posts = wwr_soup.find_all("li", {"class": "feature"})

for post in posts:
    print('https://weworkremotely.com' + post.select_one('a[href*=-jobs]')['href'])

To get all the listings on the page switch to:

posts = wwr_soup.select('li:has(.tooltip)')
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your advice. I have a question. I can get all datas with posts = wwr_soup.find_all("li", {"class": "feature"}) Why do I need to change it??
feature doesn't get all the listings. Only the featured.
I understand. Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.