I am testing a code from a current 2600 magazine for a wordlist generator based off a bunch of searches in google. I get an invalid syntax from this line:
results.extend(re.findall("<a href="/%201D([^/%201D]*)/%201D">class=(?:1|s)",data.read()))
I am new to regex so I did some research on the basics of re and it seemed fairly easy but I still didn't understand the /%201D. I did a search on it and found thats it's a hex of a char code. I am still stuck on making this work. Here is the rest of the code. The line I'm having a problem with is line 36.
This is the function:
import re, sys, os, urllib
### custom useragent ###
class AppURLopener(urllib.FancyURLopener):
version = "Mozilla/5.0(compatable;MSIE 9.0; Windows NT 6.1; Trident/5.0)"
urllib._urlopener = AppURLopener()
uopen = urllib.urlopen
uencode = urllib.urlencode
def google(query, numget=10, verbose=0):
numget = int(numget)
start = 0
results = []
if verbose == 2:
print("[+]Getting " + str(numget) + " results")
while len(results) < numget:
print("[+]" + str(len(results)) + " so far...")
data = uopen("https://www.google.com/search?q="+query+"&star="+str(start))
if data.code != 200:
print("Error " + str(data.code))
break
results.extend(re.findall("<a href="/%201D([^/%201D]*)/%201D">class=(?:1|s)",data.read()))
print(data.read())
start += 10
if verbose == 2:
print("[+] Got " + str(numget) + " results")
return results[:numget]