1

I would like to use apply with two columns and add additional arguments. My use case is to perform a search on a column and return the regex to another column without overwriting existing values in the other column. Maybe iterrows is a better option :).

import random
import re
import pandas as pd
import numpy as np

    #create the dataframe
df = pd.DataFrame({ 

    'a':np.random.choice( ['the_panda','it_python','my_shark'], 6),        
    })
df["b"] = ""

Yields:

    a   b
0   the_panda   
1   my_shark    
2   my_shark    
3   the_panda   
4   it_python   
5   the_panda   

Each time I apply my function if the value appears in column "a" then I want to write the search string to column "b". So if I used "panda" and then "shark" to search it would look like this:

a   b
0   the_panda   panda
1   my_shark    shark
2   my_shark    shark
3   the_panda   panda
4   it_python   
5   the_panda   panda

I created a simple function:

def search_log(b,a,search_sting):
    so = re.search(search_string,a)
    if so:
        return search_string
    else:
        return b

However I'm not sure if there is a way to add additional arguments to the apply function in this case? Here is what I'm trying:

search_string = 'panda'
df['b'] = df.apply(lambda x: search_log(x['b'],x['a']),args=(search_string,),axis=1)

Which yields:

TypeError: ('<lambda>() takes 1 positional argument but 2 were given', 'occurred at index 0')

...or

df['b'] = df.apply(lambda x: search_log(x['b'],x['a'],args=(search_string,),axis=1))

which yields:

KeyError: ('b', 'occurred at index a')
5
  • yes, that is because of random.choice Commented Apr 16, 2018 at 18:08
  • It looks that way because of the way the dataframes appear when I copy them onto SO. I've updated my example to hopefully make that more clear. Commented Apr 16, 2018 at 18:18
  • 1
    I'm sorry if I misunderstand your problem, don't you want simply to do: df['b'] = df.apply(lambda x: search_log(x['b'],x['a'],search_string),axis=1)? Commented Apr 16, 2018 at 18:20
  • Wow, Ben. that's it exactly. I got so caught up with using the "args" parameter. Thanks! Commented Apr 16, 2018 at 18:24
  • actually, it's bit tricky because if you do search_string ='python' and df['b'] = df.apply(lambda x: search_log(x['b'],x['a'],'shark'),axis=1), the answer is interesting! So I see why you where looking for "args". Commented Apr 16, 2018 at 18:54

1 Answer 1

1
string = ["panda","shark","python"]
df["b"] = df["a"].apply(lambda y:[x for x in string if x in y][0] if len([x for x in string if x in y])==1 else "")

Output:

           a b
0  it_python  
1   my_shark  
2   my_shark  
3  the_panda  
4   my_shark  
5   my_shark  

       a       b
0  it_python  python
1   my_shark   shark
2   my_shark   shark
3  the_panda   panda
4   my_shark   shark
5   my_shark   shark
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.