1

I am working with a CSV file and I need to find the greatest several items in a column. I was able to find the top value just by doing the standard looping through and comparing values.

My idea to get the top few values would be to either store all of the values from that column into an array, sort it, and then pull the last three indices. However I'm not sure if that would be a good idea in terms of efficiency. I also need to pull other attributes associated with the top value and it seems like separating out these column values would make everything messy.

Another thing that I thought about doing is having three variables and doing a running top value sort of deal, where every time I find something bigger I compare the "top three" amongst each other and reorder them. That also seems a bit complex and I'm not sure how I would implement it.

I would appreciate some ideas or if someone told if I'm missing something obvious. Let me know if you need to see my sample code (I felt it was probably unnecessary here).

Edit: To clarify, if the column values are something like [2,5,6,3,1,7] I would want to have the values first = 7, second = 6, third = 5

1
  • use pandas module Commented Oct 29, 2016 at 14:51

1 Answer 1

2

Pandas looks perfect for your task:

import pandas as pd
df = pd.read_csv('data.csv')
df.nlargest(3, 'column name')
Sign up to request clarification or add additional context in comments.

1 Comment

If I would like to use "with open" to open my csv file, how would I rewrite the "df = " line. Currently I have "data_file = csv.DictReader(csvfile) next(data_file)" (sorry about the formatting). Should I get rid of the DictReader? I use it to do some other things so I'm not sure if I can get rid of it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.