0

On my administration page I have a list of accounts with various values that I wanna to capture, like id, name, type, etc. On Regex101 its capturing perfectly all the values with "g" and "s" modifiers active. This what I trying to do:

def extract_accounts(src):
        list_accounts = []
        try:
            pattern = re.compile(r'''id=(?P<id>.*?)&serverzone=.\">(?P<name>[a-zA-Z].*?)<\/a>.*?75px;\">(?P<level>.*?)<\/td>.*?75px;.*?75px;\">(?P<type>.*?)<\/td>.*?Open!''', re.X)
            print type(pattern)
            match = pattern.match(src)
            print match, "type=", type(match)
            name = match.group("name")
            print "name", name
            ids = match.group("id")
            level = match.group("level")
            type = match.group("type")
            #list_accounts.append(name, ids, level, type)
            #print ("id=", ids, ", name=",name," level=", level, " type=", type)
        except Exception as e:
            print (e)

But somehow I get this:

<type '_sre.SRE_Pattern'>
None type= <type 'NoneType'>
'NoneType' object has no attribute 'group'

I don't have a clue what I'm doing wrong. Basically what I want is to put in a list = [(name1, id1, level1, type), (name2, id2, level1, type) ..... and so on the things that I grab from each line Thanks in advance for any help.

7
  • 1
    can you print some sample string to test Commented Sep 1, 2015 at 14:35
  • Sure, this is the link > regex101.com/r/vQ8jB0/1 Commented Sep 1, 2015 at 14:43
  • i cant find an error in the regex, however i get an error when running re.findall() because of a special character in the string, in your case its the • next to Evolution. python can't handle that Commented Sep 1, 2015 at 14:48
  • @LawrenceBenson to decode the string into "windows-1252" which supports that char and py can handle it? Commented Sep 1, 2015 at 14:52
  • @MikeThunder - I have updated my answer. Let me know if it is sufficient. Commented Sep 1, 2015 at 15:05

1 Answer 1

1

You should be capturing groups by their group number. I have changed the regular expression completely and implemented it like so:

#!/usr/bin/env python
# -*- coding: utf-8 -*- 
import re

def main():
    sample_data = '''
    <tr style="background-color: #343222;">
        <td style="width: 20px;"><img src="/images/Star.png" style="border: 0px;" /></td>
        <td><a target="_top" href="adminzone.php?id=2478&serverid=1">Mike</a></td>
        <td style="text-align: center;width: 75px;">74</td>
        <td>•Evolu†ion•</td>
        <td style="text-align: center;width: 100px;">1635</td>
        <td style="text-align: center;width: 75px;">40,826</td>
        <td style="text-align: center;width: 75px;">User</td>
        <td style="width: 100px;"><a target="_top" href="href="adminzone.php"><strong>Open!</strong></a></td>
    </tr>
    <tr style="background-color: #3423323;">
        <td style="width: 20px;"><img src="/images/Star.png" style="border: 0px;" /></td>
        <td><a target="_top" href="adminzone.php?suid=24800565&serverid=1">John</a></td>
        <td style="text-align: center;width: 75px;">70</td>
        <td>•Evolu†ion•</td>
        <td style="text-align: center;width: 100px;">9167</td>
        <td style="text-align: center;width: 75px;">36,223</td>
        <td style="text-align: center;width: 75px;">Admin</td>
        <td style="width: 100px;"><a style="color: #00DD19;" target="_top" href="adminzone.php?id=248005&serverid=1"><strong>Open!</strong></a></td>

'''

    matchObj = re.search('id=(.*)&serverid=.">(.*)<\\/a><\\/td>\\n.*?75px;\\">(.+)<\\/td>\\n.*\\n.*\\n.*75px;\\">(.+)<\\/td>\\n.*75px;\\">(.+)<\\/td>', sample_data, re.X)

    if matchObj:
        user_id = matchObj.group(1)
        name = matchObj.group(2)
        level = matchObj.group(3)
        user_type = matchObj.group(4)
        print user_id, name, level, user_type


if __name__ == '__main__':
    main()

Output: 2478 Mike 74 40,826

The above should give you a basic idea. Just incase you might be wondering, group(0) is the entire regular expression.

Sign up to request clarification or add additional context in comments.

2 Comments

@MikeThunder has this solved your issue or not? If you have any further difficulties let me know so that I may assist further.
I found the problem . Forgot to add re.S for multiple lines :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.