1

Am sorting a csv wrt one column but now this string is getting complicated and am not sure how to sort this

Why am still stick with pandas is like i have write back the sorted values back to csv

CSV
Snapshot,Status
21.001.1154_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_16-37-47_1280868,Unit Tested
21.001.1183_2019-01-04_14-37-47_1280868,Release

I used:
dd.sort_values(['Snapshot'],ascending=True)
du.to_csv(unit_file,header =True,index=False)

dataframe:
C:\Users\320047585\Sathish\Python>python sample.py
Before Sort
                              Snapshot       Status
0  21.001.1154_2019-01-04_14-37-47_1280868     Released
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested
2  21.001.1183_2019-01-04_14-37-47_1280868      Release

And that returned sorted values,before first _but now if both the ids are same i need to check date and even date is same i need to sort on time,any insights would be great help

Expected output
21.001.1154_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_14-37-47_1280868,Released
21.001.1183_2019-01-04_16-37-47_1280868,Unit Tested

Thanks in advance

1
  • I think you need string split with reindex, check below answer Commented Jan 31, 2019 at 14:45

2 Answers 2

1

Use s.str.split() to get the to_be_sorted value folloed by df.reindex():

df_new=df.reindex(df.Snapshot.str.split("_").str[2].sort_values().index)
print(df_new)

                                  Snapshot       Status
0  21.001.1154_2019-01-04_14-37-47_1280868     Released
2  21.001.1183_2019-01-04_14-37-47_1280868     Released
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested

If you need to take the date and time both into consideration use:

data_new = data.join(data.Snapshot.str.split("_",expand=True)).sort_values(by=[0,1,2])
print(data_new)

                                 Snapshot       Status           1         2  \
0  21.001.1154_2019-01-04_14-37-47_1280868     Released  2019-01-04  14-37-47   
2  21.001.1183_2019-01-04_14-37-47_1280868     Released  2019-01-04  14-37-47   
1  21.001.1183_2019-01-04_16-37-47_1280868  Unit Tested  2019-01-04  16-37-47   

         3  
0  1280868  
2  1280868  
1  1280868  

Of course then you can remove the unwanted columns.

Sign up to request clarification or add additional context in comments.

1 Comment

@Sathishkumar Pleasure, I could help. Please consider accepting the answer if it helped you. :) Thanks.
1

Since the whole string has to be sorted i added a minor change to anky's answer

Before
df_new = df.join(df.Snapshot.str.split("_",expand=True).drop(0,1)).sort_values(by=[1,2])

After
data_new = data.join(data.Snapshot.str.split("_",expand=True)).sort_values(by=[0,1,2])

it considers the whole string

More Interestingly

data.sort_values(['Snapshot'],ascending=True) 
Also doing the perfect sorting..! it ignores underscores and dots  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.