3

Hi I am having a problem sorting some data in a pandas DataFrame. The format of the data is different to what I am used to, and I have no idea how to make it work. The data I want to sort is on this format:

[['2016-05-23', 'name1', 'data1'],
['2016-05-23', 'name2', 'data2'],
['2016-05-24', 'name1', 'data1'],
['2016-05-24', 'name2', 'data2'],
['2016-05-25', 'name1', 'data1'],
['2016-05-25', 'name2', 'data2'],
['2016-05-26', 'name1', 'data1'],
['2016-05-26', 'name2', 'data2'],
['2016-05-27', 'name1', 'data1'],
['2016-05-27', 'name2', 'data2']]

What I want to do is to get it sortet on date, and with the name as colum names.

[['2016-05-23, 'data1', 'data2'],
['2016-05-24', 'data1', 'data2'],
['2016-05-25', 'data1', 'data2'] and so on...]

Is it possible to do this with some specific pandas command, I would also like the solution to work with more that just the two name columns. I have no idea how to do this and would appriciate any help.

1 Answer 1

2

I think you need pivot:

import pandas as pd

df = pd.DataFrame([['2016-05-23', 'name1', 'data1'],
['2016-05-23', 'name2', 'data2'],
['2016-05-24', 'name1', 'data1'],
['2016-05-24', 'name2', 'data2'],
['2016-05-25', 'name1', 'data1'],
['2016-05-25', 'name2', 'data2'],
['2016-05-26', 'name1', 'data1'],
['2016-05-26', 'name2', 'data2'],
['2016-05-27', 'name1', 'data1'],
['2016-05-27', 'name2', 'data2']], columns = ['a','b','c'])

print (df)
            a      b      c
0  2016-05-23  name1  data1
1  2016-05-23  name2  data2
2  2016-05-24  name1  data1
3  2016-05-24  name2  data2
4  2016-05-25  name1  data1
5  2016-05-25  name2  data2
6  2016-05-26  name1  data1
7  2016-05-26  name2  data2
8  2016-05-27  name1  data1
9  2016-05-27  name2  data2
#convert column a to datetime
df['a'] = pd.to_datetime(df.a)

print (df.pivot(index='a', columns='b', values='c'))
b           name1  name2
a                       
2016-05-23  data1  data2
2016-05-24  data1  data2
2016-05-25  data1  data2
2016-05-26  data1  data2
2016-05-27  data1  data2

And then if you need sort_values e.g. by column name1:

import pandas as pd

df = pd.DataFrame([['2016-05-23', 'name1', 9],
['2016-05-23', 'name2', 4],
['2016-05-24', 'name1', 5],
['2016-05-24', 'name2', 1],
['2016-05-25', 'name1', 5],
['2016-05-25', 'name2', 7],
['2016-05-26', 'name1', 10],
['2016-05-26', 'name2', 7],
['2016-05-27', 'name1', 0],
['2016-05-27', 'name2', 1]], columns = ['a','b','c'])
print (df)
            a      b   c
0  2016-05-23  name1   9
1  2016-05-23  name2   4
2  2016-05-24  name1   5
3  2016-05-24  name2   1
4  2016-05-25  name1   5
5  2016-05-25  name2   7
6  2016-05-26  name1  10
7  2016-05-26  name2   7
8  2016-05-27  name1   0
9  2016-05-27  name2   1

print (df.pivot(index='a', columns='b', values='c').sort_values('name1'))
b           name1  name2
a                       
2016-05-27      0      1
2016-05-24      5      1
2016-05-25      5      7
2016-05-23      9      4
2016-05-26     10      7

Sometimes pivot doesnt work, then use pivot_table:

print (df.pivot_table(index='a', columns='b', values='c'))

But pivot_table uses aggfunc, default is aggfunc=np.mean if duplicates. Better explanation with sample is here and in docs.


Last you can reset_index and rename_axis (new in pandas 0.18.0):

print (df.pivot(index='a', columns='b', values='c')
         .reset_index()
         .rename_axis(None, axis=1))
            a  name1  name2
0  2016-05-23      9      4
1  2016-05-24      5      1
2  2016-05-25      5      7
3  2016-05-26     10      7
4  2016-05-27      0      1

#pandas bellow 0.18.0
df1 = df.pivot(index='a', columns='b', values='c').reset_index()
df1.columns.name = None
print (df1)
            a  name1  name2
0  2016-05-23      9      4
1  2016-05-24      5      1
2  2016-05-25      5      7
3  2016-05-26     10      7
4  2016-05-27      0      1
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, this seems to be working, is there a way to get combine this into a new dataframe? I tired to to do print (df.pivot(index='a', columns='b', values='c').sort_values('a')) in a new DataFrame to get it sorted on date, but that did not work
But a is index and it is sorted... If not, you can use df1 = df.pivot(index='a', columns='b', values='c').reset_index().sort_values('a')
I also add code for converting string column a to datetime, see edit.
Sorry for that, it was I who wrote it in wrong. It seems to be working perfectly.
Thank you for accepting. Glad can help you! Good luck!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.