3

I have a .csv file with many rows and 3 columns: Date, Rep, and Sales. I would like to use Python to generate a new array that groups the data by Date and, for the given date, sorts the Reps by Sales. As an example, my input data looks like this:

salesData = [[201703,'Bob',3000], [201703,'Sarah',6000], [201703,'Jim',9000], 
    [201704,'Bob',8000], [201704,'Sarah',7000], [201704,'Jim',12000], 
    [201705,'Bob',15000], [201705,'Sarah',14000], [201705,'Jim',8000],
    [201706,'Bob',10000], [201706,'Sarah',18000]]

My desired output would look like this:

sortedData = [[201703,'Jim', 'Sarah', 'Bob'], [201704,'Jim', 'Bob', 
    'Sarah'], [201705,'Bob', 'Sarah', 'Jim'], [201706, 'Sarah', 'Bob']]

I am new to Python, but I have searched quite a bit for a solution with no success. Most of my search results lead me to believe there may be an easy way to do this using pandas (which I have not used) or numpy (which I have used).

Any suggestions would be greatly appreciated. I am using Python 3.6.

2 Answers 2

2

Use Pandas!

import pandas as pd

salesData = [[201703, 'Bob', 3000], [201703, 'Sarah', 6000], [201703, 'Jim', 9000],
             [201704, 'Bob', 8000], [201704, 'Sarah', 7000], [201704, 'Jim', 12000],
             [201705, 'Bob', 15000], [201705, 'Sarah', 14000], [201705, 'Jim', 8000],
             [201706, 'Bob', 10000], [201706, 'Sarah', 18000]]

sales_df = pd.DataFrame(salesData)
result = []
for name, group in sales_df.groupby(0):
    sorted_df = group.sort_values(2, ascending=False)
    result.append([name] + list(sorted_df[1]))
print(result)
Sign up to request clarification or add additional context in comments.

1 Comment

Wow - that's awesome! I am definitely going to need to read up on pandas - so easy and powerful. Thank you very much for the solution.
0

Without pandas, you can try this one line answer:

sortedData = [[i]+[item[1] for item in salesData if item[0]==i] for i in sorted(set([item[0] for item in salesData]))]


EDIT:
You can do this to order each inner list by sales:

sortedData = [[i]+[item[1] for item in sorted(salesData, key=lambda x: -x[2]) if item[0]==i] for i in sorted(set([item[0] for item in salesData]))]

Note that sorted(salesData, key=lambda x: -x[2]) part performs the ordering

1 Comment

Thank you for the reply. I am going to go with the pandas solution, but I'd still like to understand your code better as I'm new to learning Python. One problem I had running this line of code was that the output is in the correct format (i.e. unique dates followed by a list of names), but the names are not sorted by sales. Is there a way to add this functionality to your code? Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.