3

I got a columns with integer values(n rows). I want to generate random numbers that range from a normal distribution on values that meet certain condition. I tried with code below but they are too slow.

df_members['bd'] = df_members.bd.apply(lambda x: np.random.normal(bd_mean, bd_sd) if float(x)==-99999 else x )

I tried with code below but it will only assign one random value to all the rows.

bd_mean = 29.2223808862
bd_std = 10.4168850957
df_members[df_members['bd'] == -99999] = np.random.normal(bd_mean, bd_sd)

Example Data:

                                           msno  city     bd  gender  registered_via
0  URiXrfYPzHAlk+7+n7BOMl9G+T7g8JmrSnT/BU8GmEo=     1 -99999     NaN               9
1  U1q0qCqK/lDMTD2kN8G9OXMtfuvLCey20OAIPOvXXGQ=     1     26     NaN               4
2  W6M2H2kAoN9ahfDYKo3J6tmsJRAeuFc9wl1cau5VL1Q=     1 -99999     NaN               4
3  1qE5+cN7CUyC+KFH6gBZzMWmM1QpIVW6A43BEm98I/w=     5     17  female               4
4  SeAnaZPI+tFdAt+r3lZt/B8PgTp7bcG/1os39u4pLxs=     1 -99999     NaN               4

EDIT

I guess that generating 3425689(rows) random numbers will take a long time. I will stick to the first way at this moment.

2
  • Can you add a small sample of df_members Commented Oct 25, 2017 at 17:09
  • @Bharathshetty here you go! Commented Oct 25, 2017 at 17:11

1 Answer 1

1

You're missing the "size" argument that will give the shape of the random values to be generated.

df_members[df_members['bd'] == -99999] = np.random.normal(bd_mean, bd_sd,len(df_members[df_members['bd'] == -99999])) 

will give you what you want

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.