python regex invalid syntax

Question

I am testing a code from a current 2600 magazine for a wordlist generator based off a bunch of searches in google. I get an invalid syntax from this line:

    results.extend(re.findall("<a href="/%201D([^/%201D]*)/%201D">class=(?:1|s)",data.read()))

I am new to regex so I did some research on the basics of re and it seemed fairly easy but I still didn't understand the /%201D. I did a search on it and found thats it's a hex of a char code. I am still stuck on making this work. Here is the rest of the code. The line I'm having a problem with is line 36.

This is the function:

import re, sys, os, urllib
### custom useragent   ###
class AppURLopener(urllib.FancyURLopener):
    version = "Mozilla/5.0(compatable;MSIE 9.0; Windows NT 6.1; Trident/5.0)"

urllib._urlopener = AppURLopener()
uopen   = urllib.urlopen
uencode = urllib.urlencode

def google(query, numget=10, verbose=0):     
    numget = int(numget)
    start = 0
    results = []

    if verbose == 2:
            print("[+]Getting " + str(numget) + " results")

            while len(results) < numget:
                    print("[+]" + str(len(results)) + " so far...")
                    data = uopen("https://www.google.com/search?q="+query+"&star="+str(start))

                    if data.code != 200:
                            print("Error " + str(data.code))
                            break

                    results.extend(re.findall("<a href="/%201D([^/%201D]*)/%201D">class=(?:1|s)",data.read()))
                    print(data.read())
                    start += 10

                    if verbose == 2:
                            print("[+] Got " + str(numget) + " results")

                    return results[:numget]

Move the essential parts of code here, please.

vaultah
– vaultah

2014-05-18 15:36:16 +00:00
Commented May 18, 2014 at 15:36 — vaultah
– vaultah, Commented May 18, 2014 at 15:36

Pavel · Accepted Answer · 2014-05-18 15:38:30Z

1

first you need to escape the " in <a href="

"<a href=\"/%201D([^/%201D]*)/%201D\">class=(?:1|s)"

second, %20 encodes a single space in URLs, so %201D corresponds to " 1D".

answered May 18, 2014 at 15:38

Pavel

7,6022 gold badges33 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python regex invalid syntax

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related