2

Take this simple dataframe:

df = pd.DataFrame({
    'date':['1/15/2017', '2/15/2017','10/15/2016', '3/15/2017'], 
    'int':[2,3,1,4]
})

I'd like to sort it by the date, and then save it to a CSV without having to:

  1. Convert dates using pd.to_datetime(df['date'])
  2. Sort the dataframe using .sort_values('date')
  3. Convert dates back to .strftime('%-m/%-d/%Y')

And instead do something like this (which of course, doesn't work):

df.apply(pd.to_dataframe(df['date']).sort_values(by = 'date', inplace = True)

Output:

         date  kw
2  10/15/2016   1
0   1/15/2017   2
1   2/15/2017   3
3   3/15/2017   4

Is this possible, or should I just stick with the 3-step process?

2 Answers 2

5

numpy's argsort returns the permutation necessary for sorting an array. We can take advantage of that using iloc. So by converting the dates using pd.to_datetime then subsequently grabbing the values and calling argsort we've done all that we need to sort the original dataframe without changing any of it's columns.

df.iloc[pd.to_datetime(df.date).values.argsort()]

         date  int
2  10/15/2016    1
0   1/15/2017    2
1   2/15/2017    3
3   3/15/2017    4
Sign up to request clarification or add additional context in comments.

5 Comments

@pshep123 argsort is a np.array method that returns an array of indices that would sort the array. This is passed to iloc which indexes based on integer position, in this case, based on the indices returned by argsort. It's a very tidy solution!
@piRSquared - thanks a ton for the solution and the explanation.
@pshep123 np, glad we could help.
If the downvoter is still looking. let me know what you'd like to see to convince you to revmove the downvote. I'm always eager to improve the quality of my answers.
@piRSquared - in case you're interested, check out my comment on MaxU's answer.
3

you can use .assign() method:

In [22]: df.assign(x=pd.to_datetime(df['date'])).sort_values('x').drop('x', 1)
Out[22]:
         date  int
2  10/15/2016    1
0   1/15/2017    2
1   2/15/2017    3
3   3/15/2017    4

4 Comments

Thanks MaxU - I'm going with piRSquared due to the brevity and the fact that it doesn't create another column, but this is awesome.
@pshep123, sure, i like his answer more than mine
I was curious about this solution as I think it offers a little more flexibility to include time in addition to the date (which I realize I didn't ask about at first). But I was also curious about speed - and when I ran it for 20 years of 15-minute intervals (so roughly 700k lines), your solution was consistently more than 2x faster. Thank you!
@pshep123, that's interesting - thank you for the update!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.