0

I'm trying to scrape a website that provides individual access to court cases in New Jersey county courts. I'm having a lot of trouble figuring out how to start though. I've scraped quite a few websites before but I've usually been able to start by adapting the URL to pass through the search parameters. However, when I access this data the URL does not change so I'm at a bit of a loss.

Additionally, there is a test for me to prove that I am not a Robot (which occasionally turns into a ReCaptcha).

On the website linked above, say, for example, the inputs would be:

Case County==Bergen, Docket Type==Landlord Tenant (LT), Docket Number==000001, and Docket Year==19.

I would then like to be able to extract the Defendant Name or anything from the subsequent page.

Does anyone have any advice on how I should proceed with this?

Thanks in advance

1 Answer 1

1

Websites which "require input" can be scraped using Selenium, which evaluates the javascript: your python code then executes the page more as a "user" (click here, type there). It's slow.

Alternatively, if you look at the page details, you may see what happens to input, and simply execute the resulting GET or POST url properly formed (For example, Forms, often, will do a POST with the parameters: Look at the code and figure out what parameters get posted and to what URL, and then in python, execute that POST code -- you'll probably need a cookiejar to maintain session info.

HOWEVER As a website maintainer, my advice to you is to not attempt to scrape this site: it doesn't want to be scraped & repeated attempts only escalate defensive activities on the part of the website owner. You may also be violating usage policy, state and/or federal laws.

Instead, look for an alternative API, or alternative source. (NJ Courts may have an alternative API, designed for computer usage: send them an email!)

Sign up to request clarification or add additional context in comments.

1 Comment

Hi pbuck, Thanks for this advice, both technical and strategic. I'm going to reach out to NJ and see if they have an API as a result, and agree that it's likely not wise to try and scrape the site.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.