Sorting CSV in pandas with respect to column (string)

Question

Am sorting a csv wrt one column but now this string is getting complicated and am not sure how to sort this

Why am still stick with pandas is like i have write back the sorted values back to csv

CSV
Snapshot,Status
21.001.1154_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_16-37-47_1280868,Unit Tested
21.001.1183_2019-01-04_14-37-47_1280868,Release

I used:
dd.sort_values(['Snapshot'],ascending=True)
du.to_csv(unit_file,header =True,index=False)

dataframe:
C:\Users\320047585\Sathish\Python>python sample.py
Before Sort
                              Snapshot       Status
0  21.001.1154_2019-01-04_14-37-47_1280868     Released
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested
2  21.001.1183_2019-01-04_14-37-47_1280868      Release

And that returned sorted values,before first _but now if both the ids are same i need to check date and even date is same i need to sort on time,any insights would be great help

Expected output
21.001.1154_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_16-37-47_1280868,Unit Tested

Thanks in advance

I think you need string split with reindex, check below answer — anky
– anky, Commented Jan 31, 2019 at 14:45

anky · Accepted Answer · 2019-01-31 15:47:09Z

1

Use s.str.split() to get the to_be_sorted value folloed by df.reindex():

df_new=df.reindex(df.Snapshot.str.split("_").str[2].sort_values().index)
print(df_new)

                                  Snapshot       Status
0  21.001.1154_2019-01-04_14-37-47_1280868     Released
2  21.001.1183_2019-01-04_14-37-47_1280868     Released
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested

If you need to take the date and time both into consideration use:

data_new = data.join(data.Snapshot.str.split("_",expand=True)).sort_values(by=[0,1,2])
print(data_new)

                                 Snapshot       Status           1         2  \
0  21.001.1154_2019-01-04_14-37-47_1280868     Released  2019-01-04  14-37-47   
2  21.001.1183_2019-01-04_14-37-47_1280868     Released  2019-01-04  14-37-47   
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested  2019-01-04  16-37-47   

         3  
0  1280868  
2  1280868  
1  1280868

Of course then you can remove the unwanted columns.

edited Jan 31, 2019 at 15:47

answered Jan 31, 2019 at 14:39

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

anky Over a year ago

@Sathishkumar Pleasure, I could help. Please consider accepting the answer if it helped you. :) Thanks.

Sathish kumar · Accepted Answer · 2019-01-31 15:39:30Z

1

Since the whole string has to be sorted i added a minor change to anky's answer

Before
df_new = df.join(df.Snapshot.str.split("_",expand=True).drop(0,1)).sort_values(by=[1,2])

After
data_new = data.join(data.Snapshot.str.split("_",expand=True)).sort_values(by=[0,1,2])

it considers the whole string

More Interestingly

data.sort_values(['Snapshot'],ascending=True) 
Also doing the perfect sorting..! it ignores underscores and dots

answered Jan 31, 2019 at 15:39

Sathish kumar

397 bronze badges

Collectives™ on Stack Overflow

Sorting CSV in pandas with respect to column (string)

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related