1

This link below contains data that I need to scrape: https://jobsearch.svc.dhigroupinc.com/v1/efc/jobs/search?page=1&facets=*&countryCode2=SG&pageSize=10&currencyCode=SGD

Through the preview, I can see that there is data available, but hidden. Click link to view the preview image. Preview of data

However, it displays only: {"message":"Forbidden"}

Is there anyway I can retrieve the json data that I need just like below?

{"data":[{"id":"307ocL4mnUnNJT5V","title":"KYC Analyst","jobLocation":{"city":"Singapore",...........

Here are data for network headers if needed.

1) Data for network-headers

2) Data for network-headers

I've used selenium to retrieve data that I want, but if I could retrieve the json data I can skip using selenium but instead just use simple requests. Any ideas?

3
  • I'll be on hold here to provide more data, because I'm not sure what you guys may need. Just let me know. Thank you in advance! Commented Jul 31, 2018 at 1:32
  • 1
    @G_M sorry I've updated the full url as above. Do check. Thank youu Commented Jul 31, 2018 at 1:45
  • 1
    @G_M yes, that's why I'm wondering if there is a way to do it. The preview is just from inspecting element of the website I'm trying to scrap. But, I'm also not too sure where the data is located at. Maybe I could just provide you the website link for you to check it out : Efinancial official website - efinancialcareers.sg/search/…*&page=1&pageSize=10 Commented Jul 31, 2018 at 2:05

1 Answer 1

1

The only thing you seem to be missing is the api key. I'm not sure how often (if at all) it changes but I seem to be able to make the correct call simply by adding the x-api-key to the header.

import json

import requests

base_url = 'https://jobsearch.svc.dhigroupinc.com/v1/efc/jobs/search'
params = {
    'page': 1,
    'facets': '*',
    'countryCode2': 'SG',
    'pageSize': 10,
    'currencyCode': 'SGD',
}
headers = {
    'x-api-key': 'zvDFWwKGZ07cpXWV37lpO5MTEzXbHgyL4rKXb39C'
}

r = requests.get(base_url, headers=headers, params=params)
r.raise_for_status()

# json.dumps only for pretty printing, r.json() is all you need
print(json.dumps(r.json(), indent=2))

Output:

Sign up to request clarification or add additional context in comments.

1 Comment

@HanJinn No problem, that last link you provided was exactly what I needed. You can see the api key in the request headers. You might want to also pass a different user-agent in the headers dict so that it isn't quite as obvious that the GET request is coming from Python. You could even pass the referer and origin headers as well. I copied all of the headers and then was deleting fields one-by-one to see what the minimum number of headers I could pass and still get the JSON response. In this case, it seemed to be just the x-api-key.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.