3

I got desperate about one problem and I need some help... I'm using node.js to crawl a list of websites, some of them gives me this error, for example: http://www.fz-juelich.de/portal/DE/Home/home_node.html, Parse Error, HPE_INVALID_HEADER_TOKEN

request.get({
    url: uri,
    timeout: timeout,
    headers: {
        referer: domain
    }
}, (error, response, body) => {
    if (error)
        console.log(error);
    console.log(body);
});

though, curl -i --raw http://www.fz-juelich.de/portal/DE/Home/home_node.html works just perfect

HTTP/1.1 404 Not Found
Server: Apache-Coyote/1.1
Cache-Control: no-cache
JSESSIONID=E594677A6CCA13BE0338E1D00A729C34; Path=/cae:
Content-Type: text/html;charset=utf-8
Content-Language: de
Set-Cookie: JSESSIONID=E594677A6CCA13BE0338E1D00A729C34; Path=/
Content-Length: 19677

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" >

Also I'm able to see this website in my chrome browser

Any ideas in which side should I dig to get rid of this errors?

1
  • No ideas?... I start thinking about using some 3rd party C++/C extension Commented Oct 15, 2014 at 17:25

2 Answers 2

1

I use quotes in properties and that resolve for me :

request.post(url,{
    headers: {
      'Authorization': 'Basic onEnAGrosEncodedBase64',
      'Content-Type': 'application/x-www-form-urlencoded'
    },
    form: {
      'grant_type': 'client_credentials'
    }
 })

I hope that can help someone ;)

Sign up to request clarification or add additional context in comments.

Comments

0

I the end of this journey I no longer use node.js for crawling and parsing

Go lang crawler fits much better here, more flixibility in http library and easier to write really concurrent stuff

2 Comments

Yes, Node is very picky about HTTP Headers - I had the identical issue mentioned above. The source website was sending a Link header with HTML inside which was crashing Node. To get around it I wrote a separate script which cURLed the data I needed then called that from my Node scripts
I also struggled with this problem but insisted on using Node. Fortunately I found a lib node-libcurl which does a fantastic job and is also backed by Insomnia (free alternative to Postman) so it will be around for a while :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.