Pandas: mapping list from column values to columns?

Question

I got a pandas dataframe like below after prefixspan algorithim and I want to map those into columns.

id  ps_results
...
28  [(301, [1-1]), (63, [1-1, 0-5]), (35, [1-1, 1-...
29  [(265, [0-37]), (31, [1-1, 0-5]), (25, [0-1, 1-1, 0-1...
...

For ps_results column, there is a very long list of (frequency, [patterns]), e.g.(63, ['1-1', '0-5']), I want to generate many columns according to this list.

[
   (301, ['1-1']),
   (63, ['1-1', '0-5']),
   (35, ['1-1', '1-5']),
   (61, ['1-1', '0-3']),
   (21, ['1-1', '0-3', '1-1']),
   (125, ['1-1', '1-1']),
   (32, ['1-1', '1-1', '0-2']),
   (21, ['1-1', '1-1', '0-2', '1-1']),
   (46, ['1-1', '1-1', '1-1']),
   (20, ['1-1', '1-1', '1-1', '0-1']),
   (50, ['1-1', '1-1', '0-1']),
   (27, ['1-1', '1-1', '0-1', '1-1']),
   (22, ['1-1', '1-1', '0-5']),
   (26, ['1-1', '1-1', '0-4']),
   (25, ['1-1', '1-1', '1-2'])
   ...
]

For example, I want to get some dataframe like below,

sum_of_freq = 301 + 63 + 35 + 61 + ... + 25 + ... (for id = 28)

sum_of_freq = 265 + 31 + 25 ... (for id = 29)

id  ps_results    ['1-1']           ['1-1', '0-5']    ['1-1', '1-5']    ['0-37']  ...rest_of_the_columns
...
28  [(301,[...    301/(sum_of_freq)  63/(sum_of_freq)  35/(sum_of_freq)  0 <--this for other rows
29  [(265, [0...  0                  31/(sum_of_freq)  0                 265/(sum_of_freq)
...

Note, the length of ps_result for each id will vary, there probably have thousands of columns generated at the end.

Below is a small data for experiments.

a = {
    'id': ['22', '23'],
    'ps_results': [[(301, ['1-1']), (63, ['1-1', '0-5']), (35, ['1-1', '1-5']),
                    (61, ['1-1', '0-3']), (21, ['1-1', '0-3', '1-1']),
                    (125, ['1-1', '1-1']), (32, ['1-1', '1-1', '0-2']),
                    (21, ['1-1', '1-1', '0-2', '1-1'])],
                   [(46, ['1-1', '1-1', '1-1']),
                    (20, ['1-1', '1-1', '1-1', '0-1']),
                    (50, ['1-1', '1-1', '0-1']),
                    (27, ['1-1', '1-1', '0-1', '1-1']),
                    (22, ['1-1', '1-1', '0-5']), (26, ['1-1', '1-1', '0-4']),
                    (67, ['1-1', '0-1', '1-1'])]]
}
pd.DataFrame(data=a)

Supply table

vahndi · Accepted Answer · 2022-11-08 01:14:42Z

3

+50

Not sure if this is what you want. Let me know if not:

a = {
    'id': ['22', '23'],
    'ps_results': [[(301, ['1-1']), (63, ['1-1', '0-5']), (35, ['1-1', '1-5']),
                    (61, ['1-1', '0-3']), (21, ['1-1', '0-3', '1-1']),
                    (125, ['1-1', '1-1']), (32, ['1-1', '1-1', '0-2']),
                    (21, ['1-1', '1-1', '0-2', '1-1'])],
                   [(46, ['1-1', '1-1', '1-1']),
                    (20, ['1-1', '1-1', '1-1', '0-1']),
                    (50, ['1-1', '1-1', '0-1']),
                    (27, ['1-1', '1-1', '0-1', '1-1']),
                    (22, ['1-1', '1-1', '0-5']), (26, ['1-1', '1-1', '0-4']),
                    (67, ['1-1', '0-1', '1-1'])]]
}
data = DataFrame(data=a)
data['sum_of_freq'] = data['ps_results'].map(
    lambda tuples: sum(t[0] for t in tuples)
)
exploded = data.explode(column='ps_results')
exploded['freq'] = exploded['ps_results'].map(lambda v: v[0])
exploded['pattern'] = exploded['ps_results'].map(lambda v: v[1])
exploded['freq_ratio'] = exploded['freq'] / exploded['sum_of_freq']
exploded['pattern'] = exploded['pattern'].map(lambda l: tuple(l))
pivot = exploded.pivot(
    index='id', columns='pattern', values='freq_ratio'
).fillna(0).reset_index()

edited Nov 8, 2022 at 1:14

answered Nov 8, 2022 at 0:38

vahndi

1,0458 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Yiffany Over a year ago

Not what I want, I want frequency / sum(t[0] for t in tuples) and map each ['1-1', '1-1', '1-2'] in to a column

Yiffany Over a year ago

Added a table ask me if you not clear

vahndi Over a year ago

Oh I didn't look at your table properly. I don't think you can have lists as pandas columns so I turned them into tuples for this.

vahndi Over a year ago

Alternatively, you could replace lambda l: tuple(l) with lambda l: str(l) if you care about the column looking right, but it won't be a list anymore

Yiffany Over a year ago

How to reset index so I can only have id column and (1-1,),... columns?

|

Collectives™ on Stack Overflow

Pandas: mapping list from column values to columns?

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related