So I have a crawler that uses something like this:
#if ".mp3" in baseUrl[0] or ".pdf" in baseUrl[0]:
if baseUrl[0][-4] == "." and ".htm" not in baseUrl[0]:
raise Exception
html = requests.get(baseUrl[0], timeout=3).text
This works pretty well. What happens is, if a file like .mp4 or .m4a gets in the crawler instead of an HTML page, then the script hangs and in linux when I try to run the script it will just print:
Killed
Is there more of an efficient way to catch these non-HTML pages?