1

So, I'm still a newbie with regex and python. I've been searching for some time but don't know how to ask what I'm looking for.

I need to get data from a formatted string into a list of lists, or dictionary.

-------------------------------------------------------------------
Frank         114      0         0         0          0         114       
Joe           49       1         0         0          0         50        
Bob           37       0         0         0          0         37        
Sally         34       2         0         0          0         36     

This is the output of a script. Currently I have:

match_list = []
match = re.search('\n(\w+)\s+(\d*)\s+(\d*)', output)
  if match:
    match_list.append([match.group(1),
                       match.group(2),
                       match.group(3)])
>>>print match_list
[['frank', '114', '0']]

This is perfect, except that I need to have match_list return:

[['frank', '114', '0'],
 ['Joe', '49', '1'],
 ['Bob', '37', '0'],
 ['Sally', '34', '2']]

My initial thought was to for loop, and check if the match.group(1) was already listed, and if so move to the next, but then I realized I didn't know how to do that. But there you have it. I am having a hard time figuring this out. Any help would be fantastic! :)

Oh also. The list size changes. Sometimes there may only be one user, other times there may be 20 users. So I can just set up a giant static regex. (that I know of...)

2
  • Is there a reason that you have to use regex (like an assignment requirement) or can you use anything which works? Commented Sep 11, 2012 at 20:23
  • No it's not an assignment. I'm just data tracking. I was hoping to keep it in regex, as I've been told it's very useful, and would like to be more familiar with it. If there's an incredibly simpler way though, I'd be fine with that. Commented Sep 11, 2012 at 20:27

2 Answers 2

4

You can use re.findall:

match_list = []
match = re.findall('\n(\w+)\s+(\d*)\s+(\d*)', output)
for k in match:
    #k will be a tuple like this: ('frank', '114', '0')
    match_list.append(list(k))

or Same solution as an oneliner:

match_list = map(list, re.findall('\n(\w+)\s+(\d*)\s+(\d*)', output))
Sign up to request clarification or add additional context in comments.

1 Comment

This is perfect. I needed to loop through the matches anyway, so this will cut out a step for me. It also let's me add to my regex and pull from the other columns in the future without getting bloated lists.
3

You don't need a regex:

table="""\
-------------------------------------------------------------------
Frank         114      0         0         0          0         114       
Joe           49       1         0         0          0         50        
Bob           37       0         0         0          0         37        
Sally         34       2         0         0          0         36"""

print [line.split() for line in table.splitlines()[1:]]

Or, if you want a regex:

print [list(t) for t in re.findall(r'^(\w+)'+r'\s+(\d+)'*6,table,re.MULTILINE)] 

Either case, prints:

[['Frank', '114', '0', '0', '0', '0', '114'], 
 ['Joe', '49', '1', '0', '0', '0', '50'], 
 ['Bob', '37', '0', '0', '0', '0', '37'], 
 ['Sally', '34', '2', '0', '0', '0', '36']]

1 Comment

This is similar to what I would have done, except that I'd've used .splitlines(). This makes assumptions about how the data looks that the regex doesn't, but I'd still start from this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.