I have the following pandas Dataframe:
dict1 = {'file': ['filename2', 'filename2', 'filename3', 'filename4', 'filename4', 'filename3'], 'amount': [3, 4, 5, 1, 2, 1], 'front':[21889611, 36357723, 196312, 11, 42, 1992], 'back':[21973805, 36403870, 277500, 19, 120, 3210]}
df1 = pd.DataFrame(dict1)
print(df1)
file amount front back
0 filename2 3 21889611 21973805
1 filename2 4 36357723 36403870
2 filename3 5 196312 277500
3 filename4 1 11 19
4 filename4 2 42 120
5 filename3 1 1992 3210
My task is to take N random draws between front and back, whereby N is equal to the value in amount. Parse this into a dictionary.
To do this on an row-by-row basis is easy for me to understand:
e.g. row 1
import numpy as np
random_draws = np.random.choice(np.arange(21889611, 21973805+1), 3)
e.g. row 2
random_draws = np.random.choice(np.arange(36357723, 36403870+1), 4)
Normally with pandas, users could define this as a function and use something like
def func(front, back, amount):
return np.random.choice(np.arange(front, back+1), amount)
df["new_column"].apply(func)
but the result of my function is an array of varying size.
My second problem is that I would like the output to be a dictionary, of the format
{file: [random_draw_results], file: [random_draw_results], file: [random_draw_results], ...}
For the above example df1, the function should output this dictionary (given the draws):
final_dict = {"filename2": [21927457, 21966814, 21898538, 36392840, 36375560, 36384078, 36366833],
"filename3": 212143, 239725, 240959, 197359, 276948, 3199],
"filename4": [100, 83, 15]}