I got a pandas dataframe like below after prefixspan algorithim and I want to map those into columns.
id ps_results
...
28 [(301, [1-1]), (63, [1-1, 0-5]), (35, [1-1, 1-...
29 [(265, [0-37]), (31, [1-1, 0-5]), (25, [0-1, 1-1, 0-1...
...
For ps_results column, there is a very long list of (frequency, [patterns]), e.g.(63, ['1-1', '0-5']), I want to generate many columns according to this list.
[
(301, ['1-1']),
(63, ['1-1', '0-5']),
(35, ['1-1', '1-5']),
(61, ['1-1', '0-3']),
(21, ['1-1', '0-3', '1-1']),
(125, ['1-1', '1-1']),
(32, ['1-1', '1-1', '0-2']),
(21, ['1-1', '1-1', '0-2', '1-1']),
(46, ['1-1', '1-1', '1-1']),
(20, ['1-1', '1-1', '1-1', '0-1']),
(50, ['1-1', '1-1', '0-1']),
(27, ['1-1', '1-1', '0-1', '1-1']),
(22, ['1-1', '1-1', '0-5']),
(26, ['1-1', '1-1', '0-4']),
(25, ['1-1', '1-1', '1-2'])
...
]
For example, I want to get some dataframe like below,
sum_of_freq = 301 + 63 + 35 + 61 + ... + 25 + ... (for id = 28)
sum_of_freq = 265 + 31 + 25 ... (for id = 29)
id ps_results ['1-1'] ['1-1', '0-5'] ['1-1', '1-5'] ['0-37'] ...rest_of_the_columns
...
28 [(301,[... 301/(sum_of_freq) 63/(sum_of_freq) 35/(sum_of_freq) 0 <--this for other rows
29 [(265, [0... 0 31/(sum_of_freq) 0 265/(sum_of_freq)
...
Note, the length of ps_result for each id will vary, there probably have thousands of columns generated at the end.
Below is a small data for experiments.
a = {
'id': ['22', '23'],
'ps_results': [[(301, ['1-1']), (63, ['1-1', '0-5']), (35, ['1-1', '1-5']),
(61, ['1-1', '0-3']), (21, ['1-1', '0-3', '1-1']),
(125, ['1-1', '1-1']), (32, ['1-1', '1-1', '0-2']),
(21, ['1-1', '1-1', '0-2', '1-1'])],
[(46, ['1-1', '1-1', '1-1']),
(20, ['1-1', '1-1', '1-1', '0-1']),
(50, ['1-1', '1-1', '0-1']),
(27, ['1-1', '1-1', '0-1', '1-1']),
(22, ['1-1', '1-1', '0-5']), (26, ['1-1', '1-1', '0-4']),
(67, ['1-1', '0-1', '1-1'])]]
}
pd.DataFrame(data=a)
