0

I have been scratching my head since morning on how to tackle this problem. I need to pass multiple columns from multiple dataframes to a function as arguments.

Example : Df1

A B C
1 11 111,333
2 22 222
3 33 nan

Df2

D E
a 111
b 333

Now, I want all the rows from Df2(col E) which are not present in the Df1 (Col C) along with DF1(A,B) concatenated.

The output I want is below:

newcol C
2_22 222

I have written below code snippet to do the same but I'm stuck at how to apply it to the dataframes.

def get_wrong_ls(a,b,c,d):
    wrong_l_list = []
    wrong_l_dict = {}
    if c is not None:
        c_list = str(c).split(",")
        for ele in c_list:
            if ele not in d:
                wrong_l_dict[str(a)+"_"+str(b)] = c
            wrong_l_list.append(wrong_l_dict)
    return pd.DataFrame(wrong_l_list)

it returns correct output when the strings are passed. But I'm stuck at how to pass the dataframes directly. The below code behaves weirdly and also the "nan" in the col C doesn't get filtered out.

fd = get_wrong_ls(df1['A'],df1['B'],df1['C'],df2['E'])

Please help.

4
  • Could you update your question to show the desired output? Commented Sep 23, 2022 at 13:08
  • Would you please specify the inputs and desired outputs? Commented Sep 23, 2022 at 13:09
  • It looks like you're missing a lot of information and that the provided example doesn't reflect the real use case (why would you split a number on commas?) Commented Sep 23, 2022 at 13:12
  • I have edited the question to get the desired output. Please check. @Chris Commented Sep 23, 2022 at 13:18

1 Answer 1

1
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [11, 22, 33], 'C': ['111,333', '222', np.nan]})
df2 = pd.DataFrame({'D': ['a', 'b'], 'E': [111, 333]})

def get_wrong_ls(df1, df2):
    df1.dropna(inplace=True)
    df1['C'] = df1['C'].str.split(',')
    df1 = df1.explode('C').astype(int)
    out = df1[~df1['C'].isin(df2['E'])].copy()
    out['newcol'] = out['A'].astype(str) + '_' + out['B'].astype(str)

    return out[['newcol','C']]

get_wrong_ls(df1,df2)

Output

   newcol   C
1   2_22    222
Sign up to request clarification or add additional context in comments.

1 Comment

You are amazing!! I'm quite new to this and seem to get stuck at what looks like simple stuff but it's so hard to figure things out. Hopefully, It will get easier with time and practice.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.