Python : get random data from dataframe pandas

Question

Have a df with values :

name     algo      accuracy
tom       1         88
tommy     2         87
mark      1         88
stuart    3         100
alex      2         99
lincoln   1         88

How to randomly pick 4 records from df with a condition that at least one record should be picked from each unique algo column values. here, algo column has only 3 unique values (1 , 2 , 3 )

Sample outputs:

name     algo      accuracy
tom       1         88
tommy     2         87
stuart    3         100
lincoln   1         88

sample output2:

name     algo      accuracy
mark      1         88
stuart    3         100
alex      2         99
lincoln   1         88

Quang Hoang · Accepted Answer · 2020-10-29 20:08:08Z

3

One way

num_sample, num_algo = 4, 3

# sample one for each algo
out = df.groupby('algo').sample(n=num_sample//num_algo)

# append one more sample from those that didn't get selected.
out = out.append(df.drop(out.index).sample(n=num_sample-num_algo) )

Another way is to shuffle the whole data, enumerate the rows within each algo, sort by that enumeration and take the required number of samples. This is slightly more code than the first approach, but is cheaper and produces more balanced algo counts:

# shuffle data
df_random = df['algo'].sample(frac=1)

# enumerations of rows with the same algo
enums = df_random.groupby(df_random).cumcount()

# sort with `np.argsort`:
enums = enums.sort_values()

# pick the first num_sample indices
# these will be indices of the samples
# so we can use `loc`
out = df.loc[enums.iloc[:num_sample].index]

edited Oct 29, 2020 at 20:08

answered Oct 29, 2020 at 19:54

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python : get random data from dataframe pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related