1

I am trying to combine multiple columns from a dataframe into a new column in the same dataframe. Those columns could either have a string value or are Na or NaN. Whenever one column is Na/NaN, I would like these columns to not be included in the final string.

E.g.

       a        b        c        d       RESULT
0      AA       BB       CC       DD      AA;BB;CC;DD
1      ab       Na       cd       da      ab;cd;da
2      Na       xx       Na       Na      xx
3      Na       Na       Na       Na      Na

I have tested multiple functions already.

  1. df['RESULT'] = df['a'] + ";" + df['b'] + ... does not work as it will still nest the Na's.
  2. df['RESULT'] = ";".join(df['a'],df['b],...)does not work as join just takes one argument (and I have 4)
  3. df['RESULTS'] = [f"{a};{b};{c}" for a,b,c in zip(df['a'],df['b'], df['b'])]does not work as it adds the Na's as string to the output
  4. Pandas str.cat(): df['fill_name']= df['RESULTS'].str.cat(df['a'],sep=";").str.cat(df['b'],sep=";")... is the closest to what I am looking for, but as soon as there is one Na in one column, the whole output is Na.

In the end I am looking into something like the "TEXTJOIN" function in Excel.

2 Answers 2

1

A combo with pandas.DataFrame.stack and GroupBy.agg :

cols = ["a", "b", "c", "d"]

df["RESULT"] = df[cols].stack().groupby(level=0).agg(";".join)

Output :

print(df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN
Sign up to request clarification or add additional context in comments.

Comments

0

Use DataFrame.stack for remove missing values and aggregate join:

columns = ['a','b','c','d']
df['RESULT'] = df[columns].stack().groupby(level=0).agg(';'.join)
print (df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN

Or remove missing values in custom function with replace empty strings:

df['RESULT'] = df[columns].agg(lambda x: ";".join(x.dropna()), axis=1).replace('',np.nan)
print (df)
     a    b    c    d       RESULT
0   AA   BB   CC   DD  AA;BB;CC;DD
1   ab  NaN   cd   da     ab;cd;da
2  NaN   xx  NaN  NaN           xx
3  NaN  NaN  NaN  NaN          NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.