I have a large pandas data frame with the following format
DATE ID ACTION
01/12/2014 1 A
01/12/2014 1 B
02/12/2014 1 C
02/12/2014 1 D
01/12/2014 2 E
02/12/2014 2 F
02/12/2014 2 E
04/12/2014 2 G
Can create the data as follows:
import pandas as pd
df = pd.DataFrame({'DATE': ['01/12/2014','01/12/2014','02/12/2014','01/12/2014','02/12/2014','02/12/2014','02/12/2014','04/12/2014' ],
'ID': [1,1,1,1,2,2,2,2],
'ACTION': ['A', 'B', 'C', 'D', 'E', 'F', 'E', 'G']})
From this I want to create a list of lists for each Date/ID Grouping. At the moment here's what I'm doing... it works, but I have millions of rows so it takes hours to run. Are there any more efficient ways to achieve the same result?
listoflists = [group['ACTION'].str.strip().tolist() for name, group in df.groupby(level=['DATE', 'ID'])]
Output:
[['A', 'B', 'D'], ['C'], ['E', 'F', 'E'], ['G']]