Get Index of matching string from Two dataframe

Question

I have two data frames. I need to search through datframe 2 to see whichone matches in in datframe 1. And replace the string with its index.

So I Want a third data frame indicating the index of the matching string from dataframe 2 to dataframe 1.

  X = pd.DataFrame(np.array(['A','B','C','D','AA','AB','AC','AD','BA','BB','BC','AD']).reshape(4,3),columns=['a','b','c'])

    a   b   c
0   A   B   C
1   D   AA  AB
2   AC  AD  BA
3   BB  BC  AD


Y = pd.DataFrame(np.array(['A','AA','AC','D','B','AB','C','AD','BC','BB']).reshape(10,1),columns=['X'])


    X
0   A
1   AA
2   AC
3   D
4   B
5   AB
6   C
7   AD
8   BC
9   BB

Resulting Datafreme

    a   b   c
0   0   4   6
1   3   1   5
2   2   7   NA
3   9   8   7

Some guy suggested me with the following code but does not seems okay. Not working.

t = pd.merge(df1.stack().reset_index(), df2.reset_index(), left_on = 0, right_on = "0")
res = t.set_index(["level_0", "level_1"]).drop([0, "0"], axis=1).unstack()
print(res)

Space Impact · Accepted Answer · 2020-07-27 22:46:50Z

5

Use apply with map:

Y = Y.reset_index().set_index('X')['index']
X = X.apply(lambda x: x.map(Y))
print(X)
   a  b    c
0  0  4  6.0
1  3  1  5.0
2  2  7  NaN
3  9  8  7.0

answered Jul 27, 2020 at 22:46

Space Impact

13.3k26 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sammywemmy · Accepted Answer · 2020-07-27 23:06:49Z

4

Step1 : Create a mapping from Y :

mapping = {value: key for key, value in Y.T.to_dict("records")[0].items()}
mapping

{'A': 0,
 'AA': 1,
 'AC': 2,
 'D': 3,
 'B': 4,
 'AB': 5,
 'C': 6,
 'AD': 7,
 'BC': 8,
 'BB': 9}

Step 2: stack the X column, map the mapping to the stacked dataframe, and unstack to get back to the original shape :

X.stack().map(mapping).unstack()


     a  b   c
0   0.0 4.0 6.0
1   3.0 1.0 5.0
2   2.0 7.0 NaN
3   9.0 8.0 7.0

Alternatively, you can avoid the stack/unstack step and use replace, with pd.to_numeric :

X.replace(mapping).apply(pd.to_numeric, errors="coerce")

No tests done, just my gut feeling that mapping should be faster.

edited Jul 27, 2020 at 23:06

answered Jul 27, 2020 at 22:52

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Comments

ipj · Accepted Answer · 2020-07-27 23:53:11Z

2

Short solution based on applymap:

X.applymap(lambda x: Y[Y.X==x].index.max())

result:

   a  b    c
0  0  4  6.0
1  3  1  5.0
2  2  7  NaN
3  9  8  7.0

answered Jul 27, 2020 at 23:53

ipj

3,5981 gold badge17 silver badges18 bronze badges

Comments

RichieV · Accepted Answer · 2020-07-27 23:04:41Z

1

Y = pd.Series(Y.index, index=Y.X).sort_index()

will give you a more easily searchable object... then something like

flat = X.to_numpy().flatten()
Y = Y.reindex(np.unique(flatten)) # all items need to be in index to be able to use loc[list]
res = pd.DataFrame(Y.loc[flat].reshape(X.shape), columns=X.columns)

answered Jul 27, 2020 at 23:04

RichieV

5,1832 gold badges13 silver badges24 bronze badges

Comments

BENY · Accepted Answer · 2020-07-27 23:06:42Z

0

Let us do

X = X.where(X.isin(Y.X.tolist())).replace(dict(zip(Y.X,Y.index)))
Out[15]: 
   a  b    c
0  0  4  6.0
1  3  1  5.0
2  2  7  NaN
3  9  8  7.0

answered Jul 27, 2020 at 23:06

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Get Index of matching string from Two dataframe

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related