im tring to create small application web crawler: im write this code :
def isGood(URL):
try:
cURL = URL + text.patch
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(3)
sock.connect((URL, 80))
header = "GET %s HTTP/1.1\r\n" % text.patch
header += "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36\r\n"
header += "Accept: */*\r\n"
header += "Host: %s\r\n\r\n" % URL
sock.send(header)
data = sock.recv(1024)
html = ""
for x in range(10):
html = html + data
data = sock.recv(1024)
if len(data) == 0:
break
sock.close()
if str(text.keyword) in html:
print '+ ' + cURL
logfile = open(text.output, 'a')
logfile.write('%s\n' % (cURL))
logfile.close()
list_good.append(cURL)
except:
pass
the code is working, bat si very very slower, after connexion need close but socket not close and wait time out, how can speed up this ? im use in multi threads!
urllib, and why are you callingrecv10 times?urllibalso usessocket, uses negligible resources and, IMHO, would be better suited to what you're trying to do. You should think about what happens if your webpage is smaller than10*1024bytes.