1

My requirement is to convert the columns of a dataframe and concatenate them like below:

A   B   C   D   Target
    1   2   321 1_2_321
    2   35  123 2_35_123
    3   55  123 3_55_123
    4   33      4_33_END
    5   11  123 5_11_123

I am able to achieve it using:

df['add'] = df['B'].astype('str') + '_' + df['C'].astype('str') + '_' + df['D'].astype('str')

but I don't know how to specify the 'END' string for the last. The current code output is coming like,

A   B   C   D   add
    1   2   321 1_2_321
    2   35  123 2_35_123
    3   55  123 3_55_123
    4   33      4_33_
    5   11  123 5_11_123

Is there any thing I missed out ? anything to learn ?

4
  • df.iloc[-1,-1] += 'END'? Commented Feb 12, 2021 at 4:19
  • df.tail(1)['Target'] += 'END' ? Commented Feb 12, 2021 at 4:20
  • Sorry, edited the question now. Commented Feb 12, 2021 at 4:21
  • df['D'] = df['D'].fillna('END') ? Commented Feb 12, 2021 at 4:23

2 Answers 2

2

You can use df.apply(''.join,axis=1) to join the entire row. However, you want to join only B thru D (iloc[:,1:]). While processing for it, you also want to check if D is empty. You can use the lambda function to check for column D and then use the join accordingly.

import pandas as pd
c = ['A','B','C','D']  
d = [['',1,2,321],
    ['',2,   35,  123],
    ['',3,   55,  123],
    ['',4,   33,  ''],
    ['',5,   11,  123]]

df = pd.DataFrame(d,columns=c)
df['Target'] = df.iloc[:,1:].astype(str).apply(lambda x: '_'.join(x) if x.D != '' else '_'.join(x) + '_END' ,axis=1)
print (df)

Results will be:

  A  B   C    D     Target
0    1   2  321    1_2_321
1    2  35  123   2_35_123
2    3  55  123   3_55_123
3    4  33       4_33__END
4    5  11  123   5_11_123

Alternate, you can also do this.

Temporarily replace value of column 'D' to 'END' if value is '', and then replace it back to ''. Then you can use the join directly without any condition.

df.loc[df['D'] == '','D'] = 'END'
df['Target'] = df.iloc[:,1:].astype(str).apply('_'.join,axis=1)
df.loc[df['D'] == 'END','D'] = ''
Sign up to request clarification or add additional context in comments.

Comments

2

Try any of the 2 below -

df.apply(lambda x: '_'.join(['END' if i==None else str(i) for i in x]), axis=1)

OR

df['B'].astype('str') + '_' + df['C'].astype('str') + '_' + df['D'].fillna('END').astype('str')

The second one will be more efficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.