2

I got a pandas dataframe like below after prefixspan algorithim and I want to map those into columns.

id  ps_results
...
28  [(301, [1-1]), (63, [1-1, 0-5]), (35, [1-1, 1-...
29  [(265, [0-37]), (31, [1-1, 0-5]), (25, [0-1, 1-1, 0-1...
...

For ps_results column, there is a very long list of (frequency, [patterns]), e.g.(63, ['1-1', '0-5']), I want to generate many columns according to this list.

[
   (301, ['1-1']),
   (63, ['1-1', '0-5']),
   (35, ['1-1', '1-5']),
   (61, ['1-1', '0-3']),
   (21, ['1-1', '0-3', '1-1']),
   (125, ['1-1', '1-1']),
   (32, ['1-1', '1-1', '0-2']),
   (21, ['1-1', '1-1', '0-2', '1-1']),
   (46, ['1-1', '1-1', '1-1']),
   (20, ['1-1', '1-1', '1-1', '0-1']),
   (50, ['1-1', '1-1', '0-1']),
   (27, ['1-1', '1-1', '0-1', '1-1']),
   (22, ['1-1', '1-1', '0-5']),
   (26, ['1-1', '1-1', '0-4']),
   (25, ['1-1', '1-1', '1-2'])
   ...
]

For example, I want to get some dataframe like below,

sum_of_freq = 301 + 63 + 35 + 61 + ... + 25 + ... (for id = 28)

sum_of_freq = 265 + 31 + 25 ... (for id = 29)

id  ps_results    ['1-1']           ['1-1', '0-5']    ['1-1', '1-5']    ['0-37']  ...rest_of_the_columns
...
28  [(301,[...    301/(sum_of_freq)  63/(sum_of_freq)  35/(sum_of_freq)  0 <--this for other rows
29  [(265, [0...  0                  31/(sum_of_freq)  0                 265/(sum_of_freq)
...

Note, the length of ps_result for each id will vary, there probably have thousands of columns generated at the end.

Below is a small data for experiments.

a = {
    'id': ['22', '23'],
    'ps_results': [[(301, ['1-1']), (63, ['1-1', '0-5']), (35, ['1-1', '1-5']),
                    (61, ['1-1', '0-3']), (21, ['1-1', '0-3', '1-1']),
                    (125, ['1-1', '1-1']), (32, ['1-1', '1-1', '0-2']),
                    (21, ['1-1', '1-1', '0-2', '1-1'])],
                   [(46, ['1-1', '1-1', '1-1']),
                    (20, ['1-1', '1-1', '1-1', '0-1']),
                    (50, ['1-1', '1-1', '0-1']),
                    (27, ['1-1', '1-1', '0-1', '1-1']),
                    (22, ['1-1', '1-1', '0-5']), (26, ['1-1', '1-1', '0-4']),
                    (67, ['1-1', '0-1', '1-1'])]]
}
pd.DataFrame(data=a)

Supply table enter image description here

1 Answer 1

3
+50

Not sure if this is what you want. Let me know if not:

a = {
    'id': ['22', '23'],
    'ps_results': [[(301, ['1-1']), (63, ['1-1', '0-5']), (35, ['1-1', '1-5']),
                    (61, ['1-1', '0-3']), (21, ['1-1', '0-3', '1-1']),
                    (125, ['1-1', '1-1']), (32, ['1-1', '1-1', '0-2']),
                    (21, ['1-1', '1-1', '0-2', '1-1'])],
                   [(46, ['1-1', '1-1', '1-1']),
                    (20, ['1-1', '1-1', '1-1', '0-1']),
                    (50, ['1-1', '1-1', '0-1']),
                    (27, ['1-1', '1-1', '0-1', '1-1']),
                    (22, ['1-1', '1-1', '0-5']), (26, ['1-1', '1-1', '0-4']),
                    (67, ['1-1', '0-1', '1-1'])]]
}
data = DataFrame(data=a)
data['sum_of_freq'] = data['ps_results'].map(
    lambda tuples: sum(t[0] for t in tuples)
)
exploded = data.explode(column='ps_results')
exploded['freq'] = exploded['ps_results'].map(lambda v: v[0])
exploded['pattern'] = exploded['ps_results'].map(lambda v: v[1])
exploded['freq_ratio'] = exploded['freq'] / exploded['sum_of_freq']
exploded['pattern'] = exploded['pattern'].map(lambda l: tuple(l))
pivot = exploded.pivot(
    index='id', columns='pattern', values='freq_ratio'
).fillna(0).reset_index()
Sign up to request clarification or add additional context in comments.

9 Comments

Not what I want, I want frequency / sum(t[0] for t in tuples) and map each ['1-1', '1-1', '1-2'] in to a column
Added a table ask me if you not clear
Oh I didn't look at your table properly. I don't think you can have lists as pandas columns so I turned them into tuples for this.
Alternatively, you could replace lambda l: tuple(l) with lambda l: str(l) if you care about the column looking right, but it won't be a list anymore
How to reset index so I can only have id column and (1-1,),... columns?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.