1

I have data frame like as following:

id   date          t_s     t_p    t_prob
1    '2020-01-01'   1       1      0.5
1    '2020-01-01'   2       1      0.55
1    '2020-01-01'   3       1      0.56
1    '2020-01-01'   4       0      0.4
1    '2020-01-01'   5       1      0.6
1    '2020-01-01'   6       1      0.7
2    '2020-01-01'   1       1      0.77
2    '2020-01-01'   2       0      0.3
2    '2020-01-01'   3       0      0.2 
2    '2020-01-01'   4       0      0.33
2    '2020-01-01'   5       1      0.66
2    '2020-01-01'   6       1      0.56
....
 

each id has same date for example (2020-01-01-2020-01-09). each id has 6 t_s(1,2,3,4,5,6) for each date, and t_p is the label for each t_s, and t_prob is the value of label fo each t_s. I want to get transform the t_prob value for each t_s in the same date to the columns like t_s_1, t_s_2, t_s_3, t_s_4, t_s_5, t_s_6. and Finally get the most value of t_prob, and t_s value. like id 1 in '2020-01-01' is t_s_6 is the most value.

 id     date              t_s_1   t_s_2   t_s_3  t_s_4   t_s_5   t_s_6  t_prob_max_s    
    1    '2020-01-01'     0.5    0.55    0.56    0.4      0.6      0.7      6
    2    '2020-01-01'     0.77   0.3     0.2    0.33     0.66      0.56     1
    ....

Thanks!

3
  • Maybe groupby, I've done this before, but I can't do it now. Commented May 18, 2021 at 7:39
  • Are the t_s values for each date per unique id present in sequential order i.e from 1 to 6? Commented May 18, 2021 at 7:45
  • It seems that unstack can do the same. Commented May 18, 2021 at 7:48

1 Answer 1

2

First group by relevant indexing columns and columns meant to be unstack. You can choose something else than "max" aggregation, depends on the context. If each occurs once, then it doesn't matter.

unstacked = df.groupby(['id', 'date', 't_s'])['t_prob'].aggregate('max').unstack()

Or alternatively:

df.pivot_table(index=['id', 'date'], columns='t_s', values='t_prob', aggfunc='max')

Which is less flexible but perhaps slightly more clear in the context.

Rename the axis such that there is no weird "t_s" name for the columns axis. Then rename the columns so that they enumerate t_s:

unstacked_renamed = unstacked.rename_axis(columns = None).rename(columns={val:f't_s_{val}' for val in unstacked.columns.values})

Get index of column with highest value for each row, then preprocess it to get the number of t_s relevant for that column:

unstacked_renamed['t_prob_max_s'] = unstacked_renamed.idxmax(axis=1).str.split('_').str[-1]

Reset the index so it is flat again:

unstacked_reindexed = unstacked_renamed.reset_index()

Inspect for correctness:

>>unstacked_reindexed
    id          date    t_s_1   t_s_2   t_s_3   t_s_4   t_s_5   t_s_6   t_prob_max_s
0   1   '2020-01-01'    0.50    0.55    0.56    0.40    0.60    0.70    6
1   2   '2020-01-01'    0.77    0.30    0.20    0.33    0.66    0.56    1

This approach works even if the initial data is unsorted by indexers, if given t_s value occurs multiple times (but then the aggregation of choice is non-negligible), or when there are missing/skipped t_s (e.g. values of t_s 1,2,3,4,5,7). It is in general pretty robust solution.

Sign up to request clarification or add additional context in comments.

5 Comments

Perhaps unstack can do it all in one go ?
What do you mean by "all" in this context? It just pivots one level from one axis to another. Maybe pd.pivot_table() may be more efficient, I will investigate.
thanks for your answer. there is error in my code, TypeError: rename_axis() got an unexpected keyword argument 'columns'
That is surprising, because the 'pd.DataFrame().rename_axis()' takes the keyword argument "columns". pandas.pydata.org/docs/reference/api/… I suggest checking if the unstacked DataFrame looks like expected. The code perhaps behaved differently if the initial DataFrame is significantly different than the one provided in the original post.
I've checked. You have old pandas version. This feature was changed in pandas version 0.24 to the functionality like in my code. What's your pandas version? For older version of pandas use syntax unstacked.rename_axis({}, axis="columns"). Docs link: pandas.pydata.org/pandas-docs/version/0.19.2/generated/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.