0

I'm trying to get the data from https://www.ecfr.gov/cgi-bin/ECFR?page=browse using requests module in python

Somehow I'm getting HTTP 403-forbidden.

header = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
"Accept-Encoding": "gzip, deflate, br", 
"Accept-Language": "en-US,en;q=0.9", 
"Cache-Control": "max-age=0", 
"Host": "httpbin.org", 
"Sec-Fetch-Dest": "document", 
"Sec-Fetch-Mode": "navigate", 
"Sec-Fetch-Site": "none", 
"Sec-Fetch-User": "?1", 
"Upgrade-Insecure-Requests": "1", 
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36", 
"X-Amzn-Trace-Id": "Root=1-5ef3288f-10e678d0e55c0670c0807730"}

r = requests.get(url , headers= header)

I have also requested using user-agent and all the parameters in headers info(which I'm seeing in developer tools) .

I have tried using free proxies / rotating user header /cookies and everything i can get my hands on. But somehow website is able to know that I'm not using header.

In the html response - I'm seeing that website is asking to complete captcha.

Is there anyways I can skip that ?

1
  • That is the whole point of captchas. They are designed to be unskippable. Look for an API or something, there is no sure way other than that. Commented Jun 24, 2020 at 13:38

1 Answer 1

1

Inspecting the http requests, I've found the cloudflare server response trace:

enter image description here

The Cloudflare or ScrapeShield is famous for its scrape protection, security levels. Read more here.

Is there anyways I can skip that ?

There are 2 ways out:

  1. Apply (plug-in) a captcha solving service. That is not that easy providing you use sole python coding.

  2. Leverage the browser automation, making ScrapeShield to think that a real user browses the website. It does take much more resources and time (incl. development time). See a scrape speed comparison table of Chromium headless instance automation vs bare http requests.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.