1

I have a column col a that can have values listed multiple times.

For each distinct value in col_a I would like to generate a corresponding random value in col_b. Like so:

col_a  col_b
A     0.25
A     0.25
B     0.12
B     0.12

How can I generate col_b?

3 Answers 3

3

You can call random.random() for each group:

import random
df.groupby('col_a')['col_a'].transform(lambda x: random.random())
Out: 
0    0.394776
1    0.394776
2    0.928343
3    0.928343
Name: col_a, dtype: float64

Assign it back:

df['col_b'] = df.groupby('col_a')['col_a'].transform(lambda x: random.random())

df
Out: 
  col_a     col_b
0     A  0.012639
1     A  0.012639
2     B  0.839752
3     B  0.839752
Sign up to request clarification or add additional context in comments.

Comments

0

I'd do it this way:

import pandas as pd
import random


df['col_b'] = 1
df['col_b'] = df.groupby('col_a')['col_b'].transform(lambda _:random.random())

Comments

0

Create numbers for the number of unique values in col_a and index it with a factorized version of col_a

u, f = np.unique(df.col_a.values, return_inverse=True)
df.assign(col_b=np.random.rand(u.size)[f])

  col_a     col_b
0     A  0.470264
1     A  0.470264
2     B  0.836461
3     B  0.836461

For large data, this is quicker

f, u = pd.factorize(df.col_a.values)
df.assign(col_b=np.random.rand(u.size)[f])

  col_a     col_b
0     A  0.476353
1     A  0.476353
2     B  0.639068
3     B  0.639068

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.