Combining columns in Pandas based on column header

Question

I'm needing to merge columns in a dataframe.

The headers will have a similar name with a different suffix, e.g.

A1 | A2 | A3 | B1 | B2 | B3

I want to end up with all of them merged:

A | B

I have this line that successfully merges a defined set of columns into a single column:

df['A'] = df[['A1','A2','A3]].apply(' '.join, axis=1)

The problem is that the headers are inconsistent in that there might be any combination of '1','2',or '3' - e.g.

A1 | A2 | A3 | B2 | C1 | C2

From the solutions I've looked at, pandas doesn't like to reference columns that don't exist, so I can't use apply statement as a blanket command.

I'm having trouble visualizing a solution beyond a list of nested Try/Except steps. If anyone has an idea, I would appreciate it!

Update
Thanks for the solutions!!! If anyone is interested, here's what worked for me:

Solution 1

for h in headers:
    cols = [col for col in df.columns if col.split('[')[0] == h]
    if cols == []:
        cols = [col for col in df.columns if col == h and col.split('[')[0] not in headers] `

Solution 2

df.groupby(df.columns.str.split('[').str[0],axis=1).agg(lambda x :' '.join(x.values.tolist()))

Elizabeth · Accepted Answer · 2018-06-15 17:29:05Z

1

You can use the df.columns attribute to find the relevant columns

a_cols = [col for col in df.columns if col[0] == 'A']

then use that list as the input for your apply function

df['A'] = df[a_cols].apply(' '.join, axis=1)

answered Jun 15, 2018 at 17:29

Elizabeth

914 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JRM Over a year ago

This worked well in a for loop with just a little tweaking to (1) split on a specific character and account for values that were already added. Thank you!

BENY · Accepted Answer · 2018-06-15 17:35:32Z

0

For example you have following dataframe

df=pd.DataFrame({'A1':['a'],'A2':['b'],'B2':['b'],'B3':['c']})

We using groupby on columns

df.groupby(df.columns.str[0],axis=1).agg(lambda x :','.join(x.values.tolist()))
Out[282]: 
     A    B
0  a,b  b,c

answered Jun 15, 2018 at 17:35

BENY

324k22 gold badges176 silver badges250 bronze badges

2 Comments

JRM Over a year ago

I like this one, but (unclear in my example) the headers may be different lengths, e.g. [ 'apple' , 'apple[1]' , 'lime[1]' , 'watermelon' ]. Is there a way to split on a character (e.g. left square bracket) rather than an index or slice?

BENY Over a year ago

@JRM change to df.columns.str.split('[').str[0]

theletz · Accepted Answer · 2018-06-15 17:41:54Z

0

import string
df = pd.DataFrame(columns=['A1', 'A2','A3', 'B1','B2','C1'])

new_cols = {}
for new_col in list(string.ascii_uppercase):
    new_cols[new_col] = [col for col in df.columns if new_col in col]

for new_col in new_cols.keys():
    df[new_col] = df[new_cols[new_col]].apply(' '.join, axis=1)

answered Jun 15, 2018 at 17:41

theletz

1,7782 gold badges18 silver badges24 bronze badges

Collectives™ on Stack Overflow

Combining columns in Pandas based on column header

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related