1

So i am working with a dataset with two data frames. The Data Frames look like this:

df1:

Item_ID  Item_Name
0        A
1        B
2        C

df2:

Item_slot_1   Item_slot_2  Item_Slot_3
2             2            1
1             2            0
0             1            1

The values in df2 represent the Item_ID from df1. How can i replace the values in df2 from the item_id to the actual item name so that df2 can look like:

Item_slot_1   Item_slot_2  Item_Slot_3
C             C            B
B             C            A
A             B            B

The data set in reality is much larger and has way more id's and names than just a,b and c

1
  • What happen if value in df2 is 100 and this value is not in df1['Item_ID'] ? Is it possible? Commented Dec 8, 2018 at 6:49

2 Answers 2

1

Create dictionary by zip and pass it to applymap, or replace or apply with map:

s = dict(zip(df1['Item_ID'], df1['Item_Name']))
#if value not exist in df1['Item_ID'] get None in df2
df2 = df2.applymap(s.get)

Or:

#if value not exist in df1['Item_ID'] get original value in df2
df2 = df2.replace(s)

Or:

#if value not exist in df1['Item_ID'] get NaN in df2
df2 = df2.apply(lambda x: x.map(s))

print (df2)
  Item_slot_1 Item_slot_2 Item_Slot_3
0           C           C           B
1           B           C           A
2           A           B           B

EDIT:

You can specified columns by names for process:

cols = ['Item_slot_1','Item_slot_2','Item_Slot_3']

df2[cols] = df2[cols].applymap(s.get)
df2[cols] = df2[cols].replace(s)
df2[cols] = df2[cols].apply(lambda x: x.map(s))
Sign up to request clarification or add additional context in comments.

1 Comment

I tried this out however this is replacing all the numbers in my df2 with the item_name. There are other columns in df2 aswell that i have not shown in the example above that have numeric values and are not related to item-id/item_name. Is there any way to specify the map function to specific columns? I tried this df2.apply(lambda x:x['item_slot_1'].map(s)). However that gave me KeyError: ('item_1', 'occurred at index match_id')
0

You can improve the speed of dictionary mapping with numpy. If your items are numbered 0-N this is trivial, if they are not, it gets a bit more tricky, but is still easily doable.


If the items in df1 are numbered 0-N, use basic indexing:

a = df1['Item_Name'].values
b = df2.values

pd.DataFrame(a[b], columns=df2.columns)

  Item_slot_1 Item_slot_2 Item_Slot_3
0           C           C           B
1           B           C           A
2           A           B           B

If they are not numbered 0-N, here is a more general approach:

x = df1['Item_ID'].values
y = df1['Item_Name'].values
z = df2.values

m = np.arange(x.max() + 1, dtype=object)
m[x] = y

pd.DataFrame(m[z], columns=df2.columns)

  Item_slot_1 Item_slot_2 Item_Slot_3
0           C           C           B
1           B           C           A
2           A           B           B

To only replace a subset of columns from df2 is also simple, let's demonstrate only replacing the first two columns of df2:

x = df1['Item_ID'].values
y = df1['Item_Name'].values
cols = ['Item_slot_1', 'Item_slot_2']
z = df2[cols].values

m = np.arange(x.max() + 1, dtype=object)
m[x] = y

df2[cols] = m[z]

  Item_slot_1 Item_slot_2  Item_Slot_3
0           C           C            1
1           B           C            0
2           A           B            1

This type of indexing nets a hefty performance gain over apply and replace:

import string

df1 = pd.DataFrame({'Item_ID': np.arange(26), 'Item_Name': list(string.ascii_uppercase)})
df2 = pd.DataFrame(np.random.randint(1, 26, (10000, 100)))

%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.applymap(s.get)

158 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.replace(s)

750 ms ± 34.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
s = dict(zip(df1['Item_ID'], df1['Item_Name']))
df2.apply(lambda x: x.map(s))

93.1 ms ± 4.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
x = df1['Item_ID'].values
y = df1['Item_Name'].values
z = df2.values
m = np.arange(x.max() + 1, dtype=object)
m[x] = y
pd.DataFrame(m[z], columns=df2.columns)

30.4 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.