Combine multiple columns as a string in python

Question

I am trying to combine multiple columns from a dataframe into a new column in the same dataframe. Those columns could either have a string value or are Na or NaN. Whenever one column is Na/NaN, I would like these columns to not be included in the final string.

E.g.

       a        b        c        d       RESULT
0      AA       BB       CC       DD      AA;BB;CC;DD
1      ab       Na       cd       da      ab;cd;da
2      Na       xx       Na       Na      xx
3      Na       Na       Na       Na      Na

I have tested multiple functions already.

df['RESULT'] = df['a'] + ";" + df['b'] + ... does not work as it will still nest the Na's.
df['RESULT'] = ";".join(df['a'],df['b],...)does not work as join just takes one argument (and I have 4)
df['RESULTS'] = [f"{a};{b};{c}" for a,b,c in zip(df['a'],df['b'], df['b'])]does not work as it adds the Na's as string to the output
Pandas str.cat(): df['fill_name']= df['RESULTS'].str.cat(df['a'],sep=";").str.cat(df['b'],sep=";")... is the closest to what I am looking for, but as soon as there is one Na in one column, the whole output is Na.

In the end I am looking into something like the "TEXTJOIN" function in Excel.

Timeless · Accepted Answer · 2023-01-30 13:27:41Z

1

A combo with pandas.DataFrame.stack and GroupBy.agg :

cols = ["a", "b", "c", "d"]

df["RESULT"] = df[cols].stack().groupby(level=0).agg(";".join)

Output :

print(df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN

answered Jan 30, 2023 at 13:27

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2023-01-30 13:26:52Z

0

Use DataFrame.stack for remove missing values and aggregate join:

columns = ['a','b','c','d']
df['RESULT'] = df[columns].stack().groupby(level=0).agg(';'.join)
print (df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN

Or remove missing values in custom function with replace empty strings:

df['RESULT'] = df[columns].agg(lambda x: ";".join(x.dropna()), axis=1).replace('',np.nan)
print (df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN

edited Jan 30, 2023 at 13:26

answered Jan 30, 2023 at 13:21

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Combine multiple columns as a string in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related