Lets say that I have the following data frame:
df_raw = pd.DataFrame({"person_id": [101, 101, 102, 102, 102, 103], "date": [0, 5, 0, 7, 11, 0], "val1": [99, 11, 22, 33, 44, 22], "val2": [77, 88, 22, 66, 55, 33]})
What I want to achieve is create a 3 dimensional numpy array such that the result should be the following:
np_pros = np.array([[[0, 99, 77], [5, 11, 88]], [[0, 22, 22], [7, 33, 66], [11, 44, 55]], [[0, 22, 33]]])
In other words, the 3D array should have the following shape [unique_ids, None, feature_size]. In my case, the number of unique_ids is 3, the feature size is 3 (all columns except the person_id), and the y column is of variable length and it indicates the number of measurments for a person_id.
I am well aware that I can create an np.zeros((unique_ids, max_num_features, feature_size)) array, populate it and then delete the elements that I don't need but I want something faster. The reason being is that my actual data-frame is huge (roughly [50000, 455]) which will result in a numpy array of roughly [12500, 200, 455].
Looking forward to your answers!