2

I have two data frames. I need to search through datframe 2 to see whichone matches in in datframe 1. And replace the string with its index.

So I Want a third data frame indicating the index of the matching string from dataframe 2 to dataframe 1.

  X = pd.DataFrame(np.array(['A','B','C','D','AA','AB','AC','AD','BA','BB','BC','AD']).reshape(4,3),columns=['a','b','c'])

    a   b   c
0   A   B   C
1   D   AA  AB
2   AC  AD  BA
3   BB  BC  AD


Y = pd.DataFrame(np.array(['A','AA','AC','D','B','AB','C','AD','BC','BB']).reshape(10,1),columns=['X'])


    X
0   A
1   AA
2   AC
3   D
4   B
5   AB
6   C
7   AD
8   BC
9   BB

Resulting Datafreme

    a   b   c
0   0   4   6
1   3   1   5
2   2   7   NA
3   9   8   7

Some guy suggested me with the following code but does not seems okay. Not working.

t = pd.merge(df1.stack().reset_index(), df2.reset_index(), left_on = 0, right_on = "0")
res = t.set_index(["level_0", "level_1"]).drop([0, "0"], axis=1).unstack()
print(res)

5 Answers 5

5

Use apply with map:

Y = Y.reset_index().set_index('X')['index']
X = X.apply(lambda x: x.map(Y))
print(X)
   a  b    c
0  0  4  6.0
1  3  1  5.0
2  2  7  NaN
3  9  8  7.0
Sign up to request clarification or add additional context in comments.

Comments

4

Step1 : Create a mapping from Y :

mapping = {value: key for key, value in Y.T.to_dict("records")[0].items()}
mapping

{'A': 0,
 'AA': 1,
 'AC': 2,
 'D': 3,
 'B': 4,
 'AB': 5,
 'C': 6,
 'AD': 7,
 'BC': 8,
 'BB': 9}

Step 2: stack the X column, map the mapping to the stacked dataframe, and unstack to get back to the original shape :

X.stack().map(mapping).unstack()


     a  b   c
0   0.0 4.0 6.0
1   3.0 1.0 5.0
2   2.0 7.0 NaN
3   9.0 8.0 7.0

Alternatively, you can avoid the stack/unstack step and use replace, with pd.to_numeric :

X.replace(mapping).apply(pd.to_numeric, errors="coerce")

No tests done, just my gut feeling that mapping should be faster.

Comments

2

Short solution based on applymap:

X.applymap(lambda x: Y[Y.X==x].index.max())

result:

   a  b    c
0  0  4  6.0
1  3  1  5.0
2  2  7  NaN
3  9  8  7.0

Comments

1

Y = pd.Series(Y.index, index=Y.X).sort_index()

will give you a more easily searchable object... then something like

flat = X.to_numpy().flatten()
Y = Y.reindex(np.unique(flatten)) # all items need to be in index to be able to use loc[list]
res = pd.DataFrame(Y.loc[flat].reshape(X.shape), columns=X.columns)

Comments

0

Let us do

X = X.where(X.isin(Y.X.tolist())).replace(dict(zip(Y.X,Y.index)))
Out[15]: 
   a  b    c
0  0  4  6.0
1  3  1  5.0
2  2  7  NaN
3  9  8  7.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.