Pandas: Add Argument to Apply with Multiple Inputs

Question

I would like to use apply with two columns and add additional arguments. My use case is to perform a search on a column and return the regex to another column without overwriting existing values in the other column. Maybe iterrows is a better option :).

import random
import re
import pandas as pd
import numpy as np

    #create the dataframe
df = pd.DataFrame({ 

    'a':np.random.choice( ['the_panda','it_python','my_shark'], 6),        
    })
df["b"] = ""

Yields:

    a   b
0   the_panda   
1   my_shark    
2   my_shark    
3   the_panda   
4   it_python   
5   the_panda

Each time I apply my function if the value appears in column "a" then I want to write the search string to column "b". So if I used "panda" and then "shark" to search it would look like this:

a   b
0   the_panda   panda
1   my_shark    shark
2   my_shark    shark
3   the_panda   panda
4   it_python   
5   the_panda   panda

I created a simple function:

def search_log(b,a,search_sting):
    so = re.search(search_string,a)
    if so:
        return search_string
    else:
        return b

However I'm not sure if there is a way to add additional arguments to the apply function in this case? Here is what I'm trying:

search_string = 'panda'
df['b'] = df.apply(lambda x: search_log(x['b'],x['a']),args=(search_string,),axis=1)

Which yields:

TypeError: ('<lambda>() takes 1 positional argument but 2 were given', 'occurred at index 0')

...or

df['b'] = df.apply(lambda x: search_log(x['b'],x['a'],args=(search_string,),axis=1))

which yields:

KeyError: ('b', 'occurred at index a')

It looks that way because of the way the dataframes appear when I copy them onto SO. I've updated my example to hopefully make that more clear. — sparrow
– sparrow, Commented Apr 16, 2018 at 18:18
I'm sorry if I misunderstand your problem, don't you want simply to do: df['b'] = df.apply(lambda x: search_log(x['b'],x['a'],search_string),axis=1)? — Ben.T
– Ben.T, Commented Apr 16, 2018 at 18:20
Wow, Ben. that's it exactly. I got so caught up with using the "args" parameter. Thanks! — sparrow
– sparrow, Commented Apr 16, 2018 at 18:24
actually, it's bit tricky because if you do search_string ='python' and df['b'] = df.apply(lambda x: search_log(x['b'],x['a'],'shark'),axis=1), the answer is interesting! So I see why you where looking for "args". — Ben.T
– Ben.T, Commented Apr 16, 2018 at 18:54

FadeoN · Accepted Answer · 2018-04-16 18:56:17Z

1

string = ["panda","shark","python"]
df["b"] = df["a"].apply(lambda y:[x for x in string if x in y][0] if len([x for x in string if x in y])==1 else "")

Output:

           a b
0  it_python  
1   my_shark  
2   my_shark  
3  the_panda  
4   my_shark  
5   my_shark  

       a       b
0  it_python  python
1   my_shark   shark
2   my_shark   shark
3  the_panda   panda
4   my_shark   shark
5   my_shark   shark

answered Apr 16, 2018 at 18:56

FadeoN

1014 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas: Add Argument to Apply with Multiple Inputs

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related