0

Currently on my project I am trying to sort the rows of a CVS sheet by a singular column, I am using PANDAS and I have seen several examples posted all around the internet, however when trying to implement this myself I have been unable to.

db = pd.read_csv(databasefile, skip_blank_lines=True, names=['ExampleOne','ExampleTwo','ExampleThree','ExampleFour'], header=1)
db.drop_duplicates(inplace=True)

db.sort_values(by=['ExampleOne'], ascending=[True])

db.to_csv(databasefile, index=False)

In the code above my thought would be that I am turning a CSV into a dataframe for PANDAS to use, in that dataframe I am dropping any duplicated rows and am sorting by the ExampleOne Column. In the end I am sending that information back to the CSV. However, when viewing the CSV after the code runs with no errors the data is not sorted in any order.

Database CSV Link

Here is the CSV in a txt format, the first 60 or so rows are sorted but that is becuase earlier in this process I am combining multiple CSV's together into one CSV.

Thankyou for reading! I would appreciate any help or suggestions anyone would have me try out as this problem has been frustrating for me. Thanks again for reading!

4
  • 1
    db.sort_values(by=['ExampleOne'], ascending=[True], inplace=True) or you can chain the operations: db.sort_values(by=['ExampleOne'], ascending=[True]).to_csv(databasefile, index=False). Commented Sep 23, 2019 at 18:37
  • @QuangHoang I have yet to try that out as I am updating some software on my computer, would chaining the .to_csv function to the sort function actually make a difference? Thankyou for the reply, ill be sure to check it out when my computer is back online Commented Sep 23, 2019 at 18:49
  • 1
    Yes, sort_values without inplace returns the sorted dataframe, not sort your original. Commented Sep 23, 2019 at 18:51
  • @QuangHoang Thankyou very much! works flawlessly now, I didnt realize that sorting did not replace the original dataframe. Once again thankyou! Commented Sep 23, 2019 at 19:21

1 Answer 1

1
databasefile = r"path"
databasefile2 = r"path"
db = pd.read_csv(databasefile, skip_blank_lines=True, names=['ExampleOne','ExampleTwo','ExampleThree','ExampleFour'], header=1)
print(db['ExampleOne'])
db.drop_duplicates(inplace=True)
db.sort_values(by=['ExampleOne'], ascending=True).to_csv(databasefile, index=False)

Here is a solution to your problem.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.