0

I am building a Spider in Perl and have a problem:

The Site I want to spider uses a JavaScript for Age-Verification and I don't know how to get past this in Perl...?

The Script looks like this:

<script type = "text/javascript">

function set_age_verified(){

    new Request({

        method: "post",

        url: "/user/set_age_verified"

    }).send();

    $('age_verification').setStyles({visibility: 'hidden', display: 'none'});

    $('page_after_verification').setStyles({visibility: 'visible', display: 'block'});

    return false;

}

</script>

And here the OnClick Event :

<a href="#" onclick="return set_age_verified();"><img src="http://example.com/age-verification-enter.gif" alt="ENTER"></a>

4 Answers 4

2

The function has two effects. One is to POST a request to the URL "/user/set_age_verified" and the other is to alter the display visibility of some HTML.

Your spider can easily ignore the second effect, but presumably the first effect, by going to the server, sets some cookie or server variable which the server will require.

You do not have to actually run the javascript, so long as the server sees the same POST data.

The answer is for your Perl script to detect pages which have this javascript, and to call a Perl function to POST the data to the age verification URL.

Any cookie or similar which is returned will have to be recorded by you - your HTTP library may take care of this for you though.

Sign up to request clarification or add additional context in comments.

Comments

1

What Perl modules are you using? WWW::Mechanize has an AJAX plugin, although it hasn't been updated in a while. I guess you could also look at something like WWW::Selenium.

But I bet that AJAX request is going to inject some HTML that requires the user to input some data, then submit a form. Pretty tricky to cover all bases for that general case...

Comments

1

Take a look at the WWW::Mechanize::Firefox module. It allows you handle some JavaScript.

Comments

1

Also, in Firefox HTTPHeaders is your best friend.

Turn it on, manually click what ever you need to in order for the Javascript to run and submit to the server, then go back to the HTTPHeaders window. It will show you exactly what that Javascript event sent to the server (GET or POST + the data, even if it is HTTPS) - as well as the server response.

1 Comment

an alternative to HTTPHeaders that I like is the "Web Scraping Proxy" from AT&T (google it). You set it up as a proxy in your browser, then navigate to the info you want to scrape. It logs all HTTP traffic in the form of Perl code that generates the identical HTTP request/response(s).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.