6

I have the following data frame:

import pandas as pd
df = pd.DataFrame({'id':['a','b','c','d','e'],
                   'XX_111_S5_R12_001_Mobile_05':[-14,-90,-90,-96,-91],
                   'YY_222_S00_R12_001_1-999_13':[-103,0,-110,-114,-114],
                   'ZZ_111_S00_R12_001_1-999_13':[1,2.3,3,5,6],
})

df.set_index('id',inplace=True)
df

Which looks like this:

Out[6]:
    XX_111_S5_R12_001_Mobile_05  YY_222_S00_R12_001_1-999_13  ZZ_111_S00_R12_001_1-999_13
id
a                           -14                         -103                          1.0
b                           -90                            0                          2.3
c                           -90                         -110                          3.0
d                           -96                         -114                          5.0
e                           -91                         -114                          6.0

What I want to do is to group the column based on the following regex:

\w+_\w+_\w+_\d+_([\w\d-]+)_\d+

So that in the end it's grouped by Mobile, and 1-999.

What's the way to do it. I tried this but fail to group them:

import re
grouped = df.groupby(lambda x: re.search("\w+_\w+_\w+_\d+_([\w\d-]+)_\d+", x).group(), axis=1)
for name, group in grouped:
    print name
    print group

Which prints:

XX_111_S5_R12_001_Mobile_05
YY_222_S00_R12_001_1-999_13
ZZ_111_S00_R12_001_1-999_13

What we want is name prints to:

Mobile
1-999
1-999

And group prints the corresponding data frame.

1
  • 1
    Could you give some additional details about what you are trying to achieve? It looks like you are trying to output 3 groups in your groupby, when the original dataframe only has 3 columns anyway. Furthermore, by definition of a groupby, the group names/labels (which you've called name) are unique, so the desired output you described is just not possible; the closest thing would be to create a row of labels (i.e. Mobile and 1-999) and use those in your groups instead, but I'm not sure if this is relevant to what you're trying to do. Commented Mar 27, 2017 at 6:30

3 Answers 3

12

You can use .str.extract on the columns in order to extract substrings for your groupby:

# Performing the groupby.
pat = '\w+_\w+_\w+_\d+_([\w\d-]+)_\d+'
grouped = df.groupby(df.columns.str.extract(pat, expand=False), axis=1)

# Showing group information.
for name, group in grouped:
    print name
    print group, '\n'

Which returns the expected groups:

1-999
    YY_222_S00_R12_001_1-999_13  ZZ_111_S00_R12_001_1-999_13
id                                                          
a                          -103                          1.0
b                             0                          2.3
c                          -110                          3.0
d                          -114                          5.0
e                          -114                          6.0 

Mobile
    XX_111_S5_R12_001_Mobile_05
id                             
a                           -14
b                           -90
c                           -90
d                           -96
e                           -91 
Sign up to request clarification or add additional context in comments.

Comments

1

After grouping, set the index of the new dataframe to [re.findall(r'\w+_\w+_\w+_\d+_([\w\d-]+)_\d+', col)[0] for col in df.columns] (which is ['Mobile', '1-999', '1-999']).

1 Comment

Looks like I overlooked your question, based on the wrong description. The problem that you have is not related to grouping. It is related to indexing.
1

You have some issues with your regex, \w matches word characters which include underscore, and that doesn't seem like what you want, if you just want to match letters and digits, using A-Za-z0-9- would be better:

df.groupby(df.columns.str.extract("([A-Za-z0-9-]+)_\d+$"), axis=1).sum()

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.