1

I have to execute an script within a function where 2 dataframes are being used.

When working separately it works well but i am not getting how to use the function where we have to deal with 2 dataframes.

Need Suggestion

df1 = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet1')
df2 = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet2')
from fuzzywuzzy import fuzz 

cross = df1[['id_number']].merge(df2[['identity_no']], how='cross')
cross['match'] = cross.apply(lambda x: fuzz.ratio(x.id_number, x.identity_no), axis=1)
df1['match_acc'] = df1.id_number.map(cross.groupby('id_number').match.max())

I need to execute the above script within a function.

I have tried using the below code but not getting how a function can be used where we have to use 2 dataframes.

def word(x,y):
   try:
      cross = x[['id_number']].merge(y[['identity_no']], how='cross')
      cross['match'] = cross.apply(lambda x: fuzz.ratio(x.id_number, x.identity_no), axis=1)
      x['match_acc'] = x.id_number.map(cross.groupby('id_number').match.max())
   return ValueError:
      x['status'] = ValueError

   return x

df = df.apply(word, axis=1)

Please Suggest.

5
  • Add a couple of rows from both of the dataframes and add the expected output to the question. Commented May 17, 2021 at 4:38
  • @ThePyGuy - I have edited the question , Please suggest how 2 dataframes can be executed within a function. Commented May 17, 2021 at 4:56
  • @Corralien- "AXN pvt Ltd" company is "IN2231D", I am trying to find the accuracy by matching with Dataframe 2 . The First script is working fine , My concern is how can i use both the dataframe in a function . Please Suggest. Commented May 17, 2021 at 6:22
  • Try / return does not exist: replace by Try / except Commented May 17, 2021 at 6:58
  • @Corralien - My mistake . Yes it would be except instead of return . Thanks for bringing to my concern., Please suggest how i can use the script inside an function. Commented May 17, 2021 at 7:13

1 Answer 1

1

To answer your question, I used the process module from fuzzywuzzy package.

from fuzzywuzzy import process

df1["match_acc"] = df1["id_number"].apply(
    lambda x: process.extractOne(x, df2["identity_no"])[1])
>>> df1
  id_number          company_name  match_acc
0   IN2231D           AXN pvt Ltd         92
1   UK654IN        Aviva Intl Ltd        100
2   SL1432H   Ship Incorporations         92
3   LK0678G  Oppo Mobiles pvt ltd        100
4   NG5678J             Nokia Inc         43

Edit: "create the logic within a function"

def word(x, y):
    x["match_acc"] = process.extractOne(x["id_number"], y["identity_no"])[1]
    return x

out = df1.apply(word, y=df2, axis=1)
>>> out
  id_number          company_name  match_acc
0   IN2231D           AXN pvt Ltd         92
1   UK654IN        Aviva Intl Ltd        100
2   SL1432H   Ship Incorporations         92
3   LK0678G  Oppo Mobiles pvt ltd        100
4   NG5678J             Nokia Inc         43
Sign up to request clarification or add additional context in comments.

2 Comments

Logic is working fine , Actually i want to create the logic within a function For Eg inside : "def test(x,y):" and return the output value which i can apply later in the datarame. Please find the reference in the above Question.
I updated my answer. I hope this is what you need!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.