1

I'm needing to merge columns in a dataframe.

The headers will have a similar name with a different suffix, e.g.

A1 | A2 | A3 | B1 | B2 | B3

I want to end up with all of them merged:

A | B

I have this line that successfully merges a defined set of columns into a single column:

df['A'] = df[['A1','A2','A3]].apply(' '.join, axis=1)

The problem is that the headers are inconsistent in that there might be any combination of '1','2',or '3' - e.g.

A1 | A2 | A3 | B2 | C1 | C2 

From the solutions I've looked at, pandas doesn't like to reference columns that don't exist, so I can't use apply statement as a blanket command.

I'm having trouble visualizing a solution beyond a list of nested Try/Except steps. If anyone has an idea, I would appreciate it!

Update
Thanks for the solutions!!! If anyone is interested, here's what worked for me:

Solution 1

for h in headers:
    cols = [col for col in df.columns if col.split('[')[0] == h]
    if cols == []:
        cols = [col for col in df.columns if col == h and col.split('[')[0] not in headers] `

Solution 2

df.groupby(df.columns.str.split('[').str[0],axis=1).agg(lambda x :' '.join(x.values.tolist()))

3 Answers 3

1

You can use the df.columns attribute to find the relevant columns

a_cols = [col for col in df.columns if col[0] == 'A']

then use that list as the input for your apply function

df['A'] = df[a_cols].apply(' '.join, axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

This worked well in a for loop with just a little tweaking to (1) split on a specific character and account for values that were already added. Thank you!
0

For example you have following dataframe

df=pd.DataFrame({'A1':['a'],'A2':['b'],'B2':['b'],'B3':['c']})

We using groupby on columns

df.groupby(df.columns.str[0],axis=1).agg(lambda x :','.join(x.values.tolist()))
Out[282]: 
     A    B
0  a,b  b,c

2 Comments

I like this one, but (unclear in my example) the headers may be different lengths, e.g. [ 'apple' , 'apple[1]' , 'lime[1]' , 'watermelon' ]. Is there a way to split on a character (e.g. left square bracket) rather than an index or slice?
@JRM change to df.columns.str.split('[').str[0]
0
import string
df = pd.DataFrame(columns=['A1', 'A2','A3', 'B1','B2','C1'])

new_cols = {}
for new_col in list(string.ascii_uppercase):
    new_cols[new_col] = [col for col in df.columns if new_col in col]

for new_col in new_cols.keys():
    df[new_col] = df[new_cols[new_col]].apply(' '.join, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.