0

I'm crawling a web page list of links that are either web pages or large binary files (PPT etc), using javascript and jquery.

How do I detect whether the content is a web page ('text/html') or not? I'm pretty sure it is looking at the HTTP header using $.ajax, and I know there are some similar posted questions, but I can't find an example that fits this particular question.

3
  • Possible duplicate of jquery how to check response type for ajax call Commented Nov 15, 2017 at 15:50
  • You cannot, until you actually visit the URL and observe the Content-Type header value. Commented Nov 15, 2017 at 15:52
  • Just check my solution Commented Nov 15, 2017 at 15:53

3 Answers 3

3

You can check extension of url - lightest method. Or you can try ajax solution

var url = 'someurl';
var xhttp = new XMLHttpRequest();
xhttp.open('HEAD', url);
xhttp.onreadystatechange = function () {
  if (this.readyState == this.DONE) {
    console.log(this.status);
    console.log(this.getResponseHeader("Content-Type"));
  }
};
xhttp.send();
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @Mateusz-Kudej, that was exactly the code I was looking for!
Im glad to help you ;) @El-Jus
2

You won't reliably be able to infer the type from the URL, as it may contain an extension like exe or html, but doesn't have to, and if it does, it's not a guarantee.

The closest you can get without completely downloading and examining the file is probably to fire off a HEAD HTTP request to the URL. This should return the response headers without the body, which in turn should contain the Content-Type header. This all depends on the implementation and configuration of the backend though, so no guarantee that the request will be answered correctly or even answered at all.

1 Comment

Thanks @Timo for the guidance on assuming a file extension is genuine, much appreciated.
1

If you have the file names, you can use filename.split('.').pop() This returns the extension of the file.

2 Comments

Yeah, what could go wrong? Joking aside, what if the link url does not actually contain the file name? For example some CMSs don't expose the actual file name in the URL
Not always, that's my point. It's not bad code per se, it's just an incomplete solution for an incomplete requirement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.