Replacing values of pandas data frame to lists using dictionary mapping?

Question

If we have a pandas data frame and a mapping dictionary for the values in the data frame, replacing the values in the data frame using the dictionary as a mapping can be done like so:

In: df
Out:
  Col1 Col2
0    a    c
1    b    c
2    b    c

In: key
Out: {'a': 1, 'b': 2, 'c': 3}

In: df.replace(key)
Out: 
   Col1  Col2
0     1     3
1     2     3
2     2     3

How can a similar transformation be accomplished when the mapping dictionary has lists as values? For example:


In: key
Out: {'a': [1, 0, 0], 'b': [0, 1, 0], 'c': [0, 0, 1]}

In: df.replace(key)
ValueError: NumPy boolean array indexing assignment cannot assign 3 input values to the 1 output values where the mask is true

In this example, the end goal would be to have a new data frame that has 3 rows and 6 columns:


1 0 0 0 0 1
0 1 0 0 0 1
0 1 0 0 0 1

mozway · Accepted Answer · 2022-03-16 16:34:06Z

3

IIUC, you can applymap+explode+reshape:

df2 = df.applymap(key.get).explode(list(df.columns))
df2 = (df2
 .set_index(df2.groupby(level=0).cumcount(), append=True)
 .unstack(level=1)
)

output:

  Col1       Col2      
     0  1  2    0  1  2
0    1  0  0    0  0  1
1    0  1  0    0  0  1
2    0  1  0    0  0  1

NB. to reset the columns: df2.columns = range(df2.shape[1])

   0  1  2  3  4  5
0  1  0  0  0  0  1
1  0  1  0  0  0  1
2  0  1  0  0  0  1

answered Mar 16, 2022 at 16:34

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user17242583 Over a year ago

df.applymap(key.get) - cool!

mozway Over a year ago

@richardec not efficient though (like a loop), but better that literal_eval I guess :p I haven't had time to look your answer yet

user17242583 Over a year ago

I came up with something else: converting key to a Series and indexing it with key ;)

Cameron Riddell · Accepted Answer · 2022-03-16 19:13:28Z

3

You can use a combination DataFrame.apply and Series.map to perform this substitution. From there, you can perform a DataFrame.sum to concatenate the lists and then cast your data back into a new DataFrame

out = pd.DataFrame(
    df.apply(lambda s: s.map(key)).sum(axis=1).tolist()
)

print(out)
   0  1  2  3  4  5
0  1  0  0  0  0  1
1  0  1  0  0  0  1
2  0  1  0  0  0  1

Semi-related testing of .sum vs .chain:

In [22]: %timeit tmp_df.sum(axis=1)
77.6 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [23]: %timeit tmp_df.apply(lambda row: list(chain.from_iterable(row)), axis=1)
197 µs ± 1.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [24]: tmp_df
Out[24]: 
        Col1       Col2
0  [1, 0, 0]  [0, 0, 1]
1  [0, 1, 0]  [0, 0, 1]
2  [0, 1, 0]  [0, 0, 1]

While I won't say that .sum is the best method for concatenating lists in a Series, .apply & chain.from_iterable doesn't seem to fair much better- at least on a very small sample like this.

edited Mar 16, 2022 at 19:13

answered Mar 16, 2022 at 16:42

Cameron Riddell

13.8k14 silver badges21 bronze badges

5 Comments

mozway Over a year ago

Beware of sum on Series of lists. This looks fancy but the efficiency is quite bad. It's more efficient to apply a chain ;)

Cameron Riddell Over a year ago

Curious if my implementation differs from what you had in mind?

mozway Over a year ago

you need to test on more columns. Try 2k columns, I got ~90 ms for sum vs ~2-3 ms for chain. The inefficiency of sum comes from the fact that it generates a new list at each step ;)

mozway Over a year ago

For 20k rows the sum is not even computing vs 20 ms for chain.

Cameron Riddell Over a year ago

Love it, thanks for the explanation and clarification!

user17242583 · Accepted Answer · 2022-03-16 16:29:54Z

2

Hmm, this is tricky.

One solution I came up with is to convert the lists to their string represention before replacing with them, because pandas treats lists specially. Then you can use itertools.chain.from_iterable on each row to combine all the lists into one big list, and create a dataframe out of that:

import ast
from itertools import chain

n = df.replace({k: str(v) for k, v in key.items()}).applymap(ast.literal_eval)
df  =pd.DataFrame(n.apply(lambda x: list(chain.from_iterable(x)), axis=1).tolist())

Output:

>>> df
   0  1  2  3  4  5
0  1  0  0  0  0  1
1  0  1  0  0  0  1
2  0  1  0  0  0  1

answered Mar 16, 2022 at 16:29

user17242583

Comments

score 2 · Accepted Answer · 2022-03-16 16:56:06Z

2

Here's a method of replacing the items with lists without looping or stringifying:

df[:] = pd.Series(key)[df.to_numpy().flatten()].to_numpy().reshape(df.shape)

Output:

>>> df
        Col1       Col2
0  [1, 0, 0]  [0, 0, 1]
1  [0, 1, 0]  [0, 0, 1]
2  [0, 1, 0]  [0, 0, 1]

Or, you can use explode and reshape to convert the data directly to a numpy array:

arr = pd.Series(key)[df.to_numpy().flatten()].explode().to_numpy().reshape(-1, 6) # 6 = len of one of the items of `key` * number of columns in df

Output:

>>> arr
array([[1, 0, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 1],
       [0, 1, 0, 0, 0, 1]], dtype=object)
       
>>> pd.DataFrame(arr)
   0  1  2  3  4  5
0  1  0  0  0  0  1
1  0  1  0  0  0  1
2  0  1  0  0  0  1

edited Mar 16, 2022 at 16:56

answered Mar 16, 2022 at 16:50

user17242583

Collectives™ on Stack Overflow

Replacing values of pandas data frame to lists using dictionary mapping?

4 Answers 4

3 Comments

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related