3

I have a DataFrame with column names in the shape of x.y, where I would like to sum up all columns with the same value on x without having to explicitly name them. That is, the value of column_name.split(".")[0] should determine their group. Here's an example:

import pandas as pd
df = pd.DataFrame({'x.1': [1,2,3,4], 'x.2': [5,4,3,2], 'y.8': [19,2,1,3], 'y.92': [10,9,2,4]})

df
Out[3]: 
   x.1  x.2  y.8  y.92
0    1    5   19    10
1    2    4    2     9
2    3    3    1     2
3    4    2    3     4

The result should be the same as this operation, only I shouldn't have to explicitly list the column names and how they should group.

pd.DataFrame({'x': df[['x.1', 'x.2']].sum(axis=1), 'y': df[['y.8', 'y.92']].sum(axis=1)})

   x   y
0  6  29
1  6  11
2  6   3
3  6   7

2 Answers 2

4

Another option, you can extract the prefix from the column names and use it as a group variable:

df.groupby(by = df.columns.str.split('.').str[0], axis = 1).sum()

#   x   y
#0  6   29
#1  6   11
#2  6   3
#3  6   7
Sign up to request clarification or add additional context in comments.

Comments

3

You can first create Multiindex by split and then groupby by first level and aggregate sum:

df.columns = df.columns.str.split('.', expand=True)
print (df)
   x      y    
   1  2   8  92
0  1  5  19  10
1  2  4   2   9
2  3  3   1   2
3  4  2   3   4

df = df.groupby(axis=1, level=0).sum()
print (df)
   x   y
0  6  29
1  6  11
2  6   3
3  6   7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.