1

Suppose a dataframe contains

attacker_1 attacker_2  attacker_3  attacker_4
Lannister   nan         nan         nan
nan         Stark       greyjoy     nan

I want to create another column called AttackerCombo that aggregates the 4 columns into 1 column. How would I go about defining such code in python? I have been practicing python and I reckon a list comprehension of this sort makes sense, but [list(x) for x in attackers] where attackers is a numpy array of the 4 columns displays all 4 columns aggregated into 1 column, however I would like to remove all the nans as well. So the result for each row instead of looking like

starknannanlannister
would look like
stark/lannister

2 Answers 2

2

I think you need apply with join and remove NaN by dropna:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                    .apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
  attacker_1 attacker_2 attacker_3  attacker_4      attackers
0  Lannister        NaN        NaN         NaN      Lannister
1        NaN      Stark    greyjoy         NaN  Stark/greyjoy

If need separator empty string use DataFrame.fillna:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('') \
                    .apply(''.join, axis=1)
print (df)
  attacker_1 attacker_2 attacker_3  attacker_4     attackers
0  Lannister        NaN        NaN         NaN     Lannister
1        NaN      Stark    greyjoy         NaN  Starkgreyjoy

Another 2 solutions with list comprehension - first compare by notnull and second check if string:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                    .apply(lambda x: '/'.join([e for e in x if pd.notnull(e)]), axis=1)
print (df)
  attacker_1 attacker_2 attacker_3  attacker_4      attackers
0  Lannister        NaN        NaN         NaN      Lannister
1        NaN      Stark    greyjoy         NaN  Stark/greyjoy


#python 3 - isinstance(e, str), python 2 - isinstance(e, basestring)
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                    .apply(lambda x: '/'.join([e for e in x if isinstance(e, str)]), axis=1)
print (df)
  attacker_1 attacker_2 attacker_3  attacker_4      attackers
0  Lannister        NaN        NaN         NaN      Lannister
1        NaN      Stark    greyjoy         NaN  Stark/greyjoy
Sign up to request clarification or add additional context in comments.

5 Comments

Perfect Solution! Thanks. Can you expand on the lines of 'axis', as per the documentation if axis =0, then function is applied on column and if axis=1 then it is applied on the row, can you explain how that works here?
exactly as you say in comment. you can test it by df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(print) and df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(print, axis=1)
Let me try to explain what I know so far, when I take a subset of say 4 columns and I apply a function using apply, then 'lambda x' is an iterable over all the rows in the subset and the function being applied here is 'a string "/" concatenated with dropna being applied on each row and this is specified using axis=1'. Is that correct or am I missing something
Can you suggest some more nuances of the same operation? as in a list comprehension for example , what I had in mind is look at each element in a row and check if its NaN, and then add the non NaNs into a list. That might solve the problem with some performance hindrance.
I add 2 another solutions, I hope they are faster.
1

You can set a new column in the dataframe that you will fill thanks to a lambda function:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}{}{}{}'.format(x[0],x[1],x[2],x[3]), axis=1)

You don't specify how you want to aggregate them, so for instance, if you want separated by a dash:

df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}-{}-{}-{}'.format(x[0],x[1],x[2],x[3]), axis=1)

3 Comments

Is there a way to compute similar operation using numpy, lets suppose the dataframe is converted to numpy. And what are the other possible comprehension functions that can be used. Thank you
I tried to modify the command @nlassaux has provided, battledf[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('').apply(lambda x : '{}{}{}{}'.format(x[0],x[1],x[2],x[3]), axis=1).unique(). This does generate a relevant solution. but Im not sure if this is an optimal one
Format is known to be optimal because it directly calls C code. Also, .apply() is fast but not as fast as pandas's built in parallel methods.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.