I'm trying to extract data from DataFrames as individual NumPy arrays to pass to SciPy stats methods.
Example DataFrame:
userId numCol
147 1.3
222 2.6
389 5.7
443 1.2
222 2.4
678 2.1
443 1.8
501 2.1
147 1.2
501 3.2
678 1.3
389 2.4
For the 6 unique userId's, let's say I only want to extract 4 separate arrays for the values of numCol for the userId's 147, 222, 389 and 443.
The output would look like this:
Array name 147: array([1.3, 1.2)]
Array name 222: array([2.6, 2.4)]
Array name 389: array([5.7, 2.4)]
Array name 443: array([1.2, 1.8)]
I'm wondering if the best approach would be to create a list for the userId's I want, then loop through the DataFrame utilising pandas isin and NumPy values.
I've looked at this similar question closely and it's not the same.