How to combine different columns in a dataframe using comprehension-python

Question

Suppose a dataframe contains

attacker_1 attacker_2  attacker_3  attacker_4
Lannister   nan         nan         nan
nan         Stark       greyjoy     nan

I want to create another column called AttackerCombo that aggregates the 4 columns into 1 column. How would I go about defining such code in python? I have been practicing python and I reckon a list comprehension of this sort makes sense, but [list(x) for x in attackers] where attackers is a numpy array of the 4 columns displays all 4 columns aggregated into 1 column, however I would like to remove all the nans as well. So the result for each row instead of looking like

starknannanlannister

would look like

stark/lannister

jezrael · Accepted Answer · 2017-01-04 07:58:10Z

2

I think you need apply with join and remove NaN by dropna:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                    .apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
  attacker_1 attacker_2 attacker_3  attacker_4      attackers
0  Lannister        NaN        NaN         NaN      Lannister
1        NaN      Stark    greyjoy         NaN  Stark/greyjoy

If need separator empty string use DataFrame.fillna:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('') \
                    .apply(''.join, axis=1)
print (df)
  attacker_1 attacker_2 attacker_3  attacker_4     attackers
0  Lannister        NaN        NaN         NaN     Lannister
1        NaN      Stark    greyjoy         NaN  Starkgreyjoy

Another 2 solutions with list comprehension - first compare by notnull and second check if string:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                    .apply(lambda x: '/'.join([e for e in x if pd.notnull(e)]), axis=1)
print (df)
  attacker_1 attacker_2 attacker_3  attacker_4      attackers
0  Lannister        NaN        NaN         NaN      Lannister
1        NaN      Stark    greyjoy         NaN  Stark/greyjoy


#python 3 - isinstance(e, str), python 2 - isinstance(e, basestring)
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                    .apply(lambda x: '/'.join([e for e in x if isinstance(e, str)]), axis=1)
print (df)
  attacker_1 attacker_2 attacker_3  attacker_4      attackers
0  Lannister        NaN        NaN         NaN      Lannister
1        NaN      Stark    greyjoy         NaN  Stark/greyjoy

edited Jan 4, 2017 at 7:58

answered Jan 4, 2017 at 5:59

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

MrKickass Over a year ago

Perfect Solution! Thanks. Can you expand on the lines of 'axis', as per the documentation if axis =0, then function is applied on column and if axis=1 then it is applied on the row, can you explain how that works here?

jezrael Over a year ago

exactly as you say in comment. you can test it by df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(print) and df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(print, axis=1)

MrKickass Over a year ago

Let me try to explain what I know so far, when I take a subset of say 4 columns and I apply a function using apply, then 'lambda x' is an iterable over all the rows in the subset and the function being applied here is 'a string "/" concatenated with dropna being applied on each row and this is specified using axis=1'. Is that correct or am I missing something

MrKickass Over a year ago

Can you suggest some more nuances of the same operation? as in a list comprehension for example , what I had in mind is look at each element in a row and check if its NaN, and then add the non NaNs into a list. That might solve the problem with some performance hindrance.

jezrael Over a year ago

I add 2 another solutions, I hope they are faster.

nlassaux · Accepted Answer · 2017-01-04 04:39:34Z

1

You can set a new column in the dataframe that you will fill thanks to a lambda function:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}{}{}{}'.format(x[0],x[1],x[2],x[3]), axis=1)

You don't specify how you want to aggregate them, so for instance, if you want separated by a dash:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}-{}-{}-{}'.format(x[0],x[1],x[2],x[3]), axis=1)

answered Jan 4, 2017 at 4:39

nlassaux

2,4463 gold badges22 silver badges36 bronze badges

3 Comments

MrKickass Over a year ago

Is there a way to compute similar operation using numpy, lets suppose the dataframe is converted to numpy. And what are the other possible comprehension functions that can be used. Thank you

MrKickass Over a year ago

I tried to modify the command @nlassaux has provided, battledf[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('').apply(lambda x : '{}{}{}{}'.format(x[0],x[1],x[2],x[3]), axis=1).unique(). This does generate a relevant solution. but Im not sure if this is an optimal one

nlassaux Over a year ago

Format is known to be optimal because it directly calls C code. Also, .apply() is fast but not as fast as pandas's built in parallel methods.

Collectives™ on Stack Overflow

How to combine different columns in a dataframe using comprehension-python

2 Answers 2

5 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related