0

I'm using Requests to scrape a website. The content of the html gets successfully saved in the variable r but in the if-statement I get the said error

[...]
for line in r:  
    link = re.findall(r ("""onclick="window.location.href='([^'])'""",line)
    if link: 
        print ('something')
        cmd = ('some commands to get info page') 
        call(cmd,shell=True)

        download = re.sub(something)
        cmd = ('some commands to download the file') 
        call(cmd,shell=True)
r.close()

I looked it up in the documentation and the syntax appears to be correct. I then suspected the error to be in the line before. Here I search for the line with the phrase onclick="window.location.href=' and want the link that follows it to be processed (in the code afterwards). The () encapsuled part should be what is returned, right?

Does anybody see an error? in

1
  • 1
    Why the bracket after 'r'? "re.findall(r (" Commented Jun 15, 2012 at 13:41

3 Answers 3

2

Perhaps the brackets?

#                1  2                                                 2                   
link = re.findall(r ("""onclick="window.location.href='([^'])'""",line)

It looks like you forgot to close the bracket for findall.

Sign up to request clarification or add additional context in comments.

4 Comments

omg you are right. After the second """ a bracket was missing! I'm so sorry, but it took me almost an hour to research the syntax to get this line. I just overlooked it. Btw.:The brackets around the strings are now neccessary in the newest python version.
@Jasi Using Python 3.2, I have never seen a need to use brackets like that, why are they needed?
@Jasi OK, good to know! In older versions of python, Don would be correct. In fact, if you try to use r(""" """) for raw strings, you'd get a NameError.
@Lattyware I just started with python and didn't question the brackets. After I tried to work through an older python tutorial I got an error for not using brackets. I googled for it and added them because they were said to be neccessary now.
0

If you separate the pattern to its own line then it makes it clear that the problem is really just one of quoting. Try separating it like this:

for line in r:
    pattern = r"onclick=\"window.location.href='([^'])'"
    link = re.findall(pattern, line)

2 Comments

I should really try to do it this way. Also: Way more readable. Thank you.
@Jasi you can go even farther and use re.VERBOSE, add whitespace and comments between elements of your regular expression which will get stripped out when you compile.
0

You appear to have both mismatched parenthesis and mismatched quotation marks. Below, I've lined them up. Does this work as expected?

#                1  2                                  3    3           21
#                    123        4                              4321
link = re.findall(r ("""onclick="window.location.href='([^'])'\"""",line))

1 Comment

The escape \ shouldn't be neccessary because of the triple " The " just before the word window is part of the string. - correct me if I'm wrong

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.