I have the following data frame:
import pandas as pd
df = pd.DataFrame({'id':['a','b','c','d','e'],
'XX_111_S5_R12_001_Mobile_05':[-14,-90,-90,-96,-91],
'YY_222_S00_R12_001_1-999_13':[-103,0,-110,-114,-114],
'ZZ_111_S00_R12_001_1-999_13':[1,2.3,3,5,6],
})
df.set_index('id',inplace=True)
df
Which looks like this:
Out[6]:
XX_111_S5_R12_001_Mobile_05 YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13
id
a -14 -103 1.0
b -90 0 2.3
c -90 -110 3.0
d -96 -114 5.0
e -91 -114 6.0
What I want to do is to group the column based on the following regex:
\w+_\w+_\w+_\d+_([\w\d-]+)_\d+
So that in the end it's grouped by Mobile, and 1-999.
What's the way to do it. I tried this but fail to group them:
import re
grouped = df.groupby(lambda x: re.search("\w+_\w+_\w+_\d+_([\w\d-]+)_\d+", x).group(), axis=1)
for name, group in grouped:
print name
print group
Which prints:
XX_111_S5_R12_001_Mobile_05
YY_222_S00_R12_001_1-999_13
ZZ_111_S00_R12_001_1-999_13
What we want is name prints to:
Mobile
1-999
1-999
And group prints the corresponding data frame.

name) are unique, so the desired output you described is just not possible; the closest thing would be to create a row of labels (i.e. Mobile and 1-999) and use those in your groups instead, but I'm not sure if this is relevant to what you're trying to do.