There's no need to create a lambda for this.
Let's suppose we have the following dataframe:
my_df = pd.DataFrame({
'Apple': ['1', '4', '7'],
'Pear': ['2', '5', '8'],
'Cherry': ['3', np.nan, '9']})
Which is:
Apple Cherry Pear 1 3 2 4 NaN 5 7 9 8
An easier way to achieve what you want without the apply() function is:
- use
iterrows()to parse each row one by one. - use
Series()andstr.cat()to do the merge.
You'll get this:
l = []
for _, row in my_df.iterrows():
l.append(pd.Series(row).str.cat(sep='::'))
empty_df = pd.DataFrame(l, columns=['Result'])
Doing this, NaN will automatically be taken out, and will lead us to the desired result:
Result 1::3::2 4::5 7::9::8
The entire program may look like:
import pandas as pd
import numpy as np
def merge_columns(my_df):
l = []
for _, row in my_df.iterrows():
l.append(pd.Series(row).str.cat(sep='::'))
empty_df = pd.DataFrame(l, columns=['Result'])
return empty_df.to_string(index=False)
if __name__ == '__main__':
my_df = pd.DataFrame({
'Apple': ['1', '4', '7'],
'Pear': ['2', '5', '8'],
'Cherry': ['3', np.nan, '9']})
print(merge_columns(my_df))
There are other things that I added to my answer as:
if __name__ == '__main__'if __name__ == '__main__'- added the logic into its own function so that you can reuse it later
As @MathiasEttinger suggested, you can also modify the above function to use list comprehension to get a slightly better performance:
def merge_columns_1(my_df):
l = [pd.Series(row).str.cat(sep='::') for _, row in my_df.iterrows()]
return pd.DataFrame(l, columns=['Result']).to_string(index=False)
I'll let the order of the columns as an exercise for OP.