1

Can anyone please help me understand this code snippet, from http://garethrees.org/2007/05/07/python-challenge/ Level2

>>> import urllib
>>> def get_challenge(s):
...     return urllib.urlopen('http://www.pythonchallenge.com/pc/' + s).read()
...
>>> src = get_challenge('def/ocr.html')
>>> import re
>>> text = re.compile('<!--((?:[^-]+|-[^-]|--[^>])*)-->', re.S).findall(src)[-1]
>>> counts = {}
>>> for c in text: counts[c] = counts.get(c, 0) + 1
>>> counts

http://garethrees.org/2007/05/07/python-challenge/

re.compile('<!--((?:[^-]+|-[^-]|--[^>])*)-->', re.S).findall(src)[-1] why we have [-1] here what's the purpose of it? is it Converting that to a list? **

2 Answers 2

1

Yes. re.findall() returns a list of all the matches. Have a look at the documentation.

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

When calling [-1] on the result, the first element from the end of the list is accessed.

For example;

>>> a = [1,2,3,4,5]
>>> a[-1]
5

And also:

>>> re.compile('.*?-').findall('-foo-bar-')[-1]
'bar-'
Sign up to request clarification or add additional context in comments.

5 Comments

I got findall is for matching, i'm fine with it. However what makes someone insert [-1 ] at the end of re.compile('<!--((?:[^-]+|-[^-]|--[^>])*)-->', re.S).findall(src)[-1] ? Couldn't understand his intention.
It's simply array access - don't be fooled by the negative index! :)
But by putting [1] it's working fine! :/ and Could you also please explain what's going here in this line? for c in text: counts[c] = counts.get(c, 0) + 1
That line is building a dictionary of the frequencies of each character in text. The zero in get(c, 0) is the default that is returned if c was not previously in the dictionary, but otherwise the count is simply incremented by 1 for each occurance.
I tried this pattern (<!--((?:[^-]+|-[^-]|--[^>])*)-->, r.S ), r.S is for all match(dot All) right? in regex interpreter to check for matches inside the page source text content ('!@#$%^&*@#$%#$^%#$^$#%^#') of the challenge . It gives a match only when i remove <!-- --> otherwise it gives null. why is that?
0

It's already a list. And if you have a list myList, myList[-1] returns the last element in that list.

Read this: https://docs.python.org/2/tutorial/introduction.html#lists.

5 Comments

Thanks for the quick reply, so does putting a [] after an expression converts that into a list?? Bec if i put [1] it works the same way for [-1] but not for [2].
It was already a list. [] just retrieves an element or slice from the list. Changing the number in there is just going to change which element it accesses. Check out the page I linked to for more examples about list indexing and slices.
I got confused because if i try to print (text) it goes on like a big chunk, but if i keep [1]/[-1] at the end , it prints line by line.
Yeah! i got it now. Very eloquent description :)
I tried this pattern ((<!--((?:[^-]+|-[^-]|--[^>])*)-->, r.S ), r.S is for all match(dot All) right? in regex interpreter to check for matches inside the page source text content ('!@#$%^&*@#$%#$^%#$^$#%^#') of the challenge . It gives a match only when i remove <!-- --> otherwise it gives null. why is that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.