Replacing values in a dataframe from another dataframe

Question

So i am working with a dataset with two data frames. The Data Frames look like this:

df1:

Item_ID  Item_Name
0        A
1        B
2        C

df2:

Item_slot_1   Item_slot_2  Item_Slot_3
2             2            1
1             2            0
0             1            1

The values in df2 represent the Item_ID from df1. How can i replace the values in df2 from the item_id to the actual item name so that df2 can look like:

Item_slot_1   Item_slot_2  Item_Slot_3
C             C            B
B             C            A
A             B            B

The data set in reality is much larger and has way more id's and names than just a,b and c

What happen if value in df2 is 100 and this value is not in df1['Item_ID'] ? Is it possible? — jezrael
– jezrael, Commented Dec 8, 2018 at 6:49

jezrael · Accepted Answer · 2018-12-08 07:10:21Z

1

Create dictionary by zip and pass it to applymap, or replace or apply with map:

s = dict(zip(df1['Item_ID'], df1['Item_Name']))
#if value not exist in df1['Item_ID'] get None in df2
df2 = df2.applymap(s.get)

Or:

#if value not exist in df1['Item_ID'] get original value in df2
df2 = df2.replace(s)

Or:

#if value not exist in df1['Item_ID'] get NaN in df2
df2 = df2.apply(lambda x: x.map(s))

print (df2)
  Item_slot_1 Item_slot_2 Item_Slot_3
0           C           C           B
1           B           C           A
2           A           B           B

EDIT:

You can specified columns by names for process:

cols = ['Item_slot_1','Item_slot_2','Item_Slot_3']

df2[cols] = df2[cols].applymap(s.get)
df2[cols] = df2[cols].replace(s)
df2[cols] = df2[cols].apply(lambda x: x.map(s))

edited Dec 8, 2018 at 7:10

answered Dec 8, 2018 at 6:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shah Abbas Over a year ago

I tried this out however this is replacing all the numbers in my df2 with the item_name. There are other columns in df2 aswell that i have not shown in the example above that have numeric values and are not related to item-id/item_name. Is there any way to specify the map function to specific columns? I tried this df2.apply(lambda x:x['item_slot_1'].map(s)). However that gave me KeyError: ('item_1', 'occurred at index match_id')

user3483203 · Accepted Answer · 2018-12-08 07:21:36Z

You can improve the speed of dictionary mapping with numpy. If your items are numbered 0-N this is trivial, if they are not, it gets a bit more tricky, but is still easily doable.

If the items in df1 are numbered 0-N, use basic indexing:

a = df1['Item_Name'].values
b = df2.values

pd.DataFrame(a[b], columns=df2.columns)

  Item_slot_1 Item_slot_2 Item_Slot_3
0           C           C           B
1           B           C           A
2           A           B           B

If they are not numbered 0-N, here is a more general approach:

x = df1['Item_ID'].values
y = df1['Item_Name'].values
z = df2.values

m = np.arange(x.max() + 1, dtype=object)
m[x] = y

pd.DataFrame(m[z], columns=df2.columns)

  Item_slot_1 Item_slot_2 Item_Slot_3
0           C           C           B
1           B           C           A
2           A           B           B

To only replace a subset of columns from df2 is also simple, let's demonstrate only replacing the first two columns of df2:

x = df1['Item_ID'].values
y = df1['Item_Name'].values
cols = ['Item_slot_1', 'Item_slot_2']
z = df2[cols].values

m = np.arange(x.max() + 1, dtype=object)
m[x] = y

df2[cols] = m[z]

  Item_slot_1 Item_slot_2  Item_Slot_3
0           C           C            1
1           B           C            0
2           A           B            1

This type of indexing nets a hefty performance gain over apply and replace:

import string

df1 = pd.DataFrame({'Item_ID': np.arange(26), 'Item_Name': list(string.ascii_uppercase)})
df2 = pd.DataFrame(np.random.randint(1, 26, (10000, 100)))

%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.applymap(s.get)

158 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.replace(s)

750 ms ± 34.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.apply(lambda x: x.map(s))

93.1 ms ± 4.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
x = df1['Item_ID'].values
y = df1['Item_Name'].values
z = df2.values
m = np.arange(x.max() + 1, dtype=object)
m[x] = y
pd.DataFrame(m[z], columns=df2.columns)

30.4 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Collectives™ on Stack Overflow

Replacing values in a dataframe from another dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related