0

I'm trying to parse a csv file in python and print the sum of order_total for each day. Below is the sample csv file

  order_total   created_datetime                                                                                                
24.99   2015-06-01 00:00:12                                                                                             
0   2015-06-01 00:03:15                                                                                             
164.45  2015-06-01 00:04:05                                                                                             
24.99   2015-06-01 00:08:01                                                                                             
0   2015-06-01 00:08:23                                                                                             
46.73   2015-06-01 00:08:51                                                                                             
0   2015-06-01 00:08:58                                                                                             
47.73   2015-06-02 00:00:25                                                                                             
101.74  2015-06-02 00:04:11                                                                                             
119.99  2015-06-02 00:04:35                                                                                             
38.59   2015-06-02 00:05:26                                                                                             
73.47   2015-06-02 00:06:50                                                                                             
34.24   2015-06-02 00:07:36                                                                                             
27.24   2015-06-03 00:01:40                                                                                             
82.2    2015-06-03 00:12:21                                                                                             
23.48   2015-06-03 00:12:35 

My objective here is to print the sum(order_total) for each day. For example the result should be

2015-06-01 -> 261.16
2015-06-02 -> 415.75
2015-06-03 -> 132.92

I have written the below code - its does not perform the logic yet, but I'm trying to see if its able to parse and loop as required by printing some sample statements.

def sum_orders_test(self,start_date,end_date):
        initial_date = datetime.date(int(start_date.split('-')[0]),int(start_date.split('-')[1]),int(start_date.split('-')[2]))
        final_date = datetime.date(int(end_date.split('-')[0]),int(end_date.split('-')[1]),int(end_date.split('-')[2]))
        day = datetime.timedelta(days=1)
        with open("file1.csv", 'r') as data_file:
            next(data_file)
            reader = csv.reader(data_file, delimiter=',')
            if initial_date <= final_date:
                for row in reader:
                    if str(initial_date) in row[1]:
                        print 'initial_date : ' + str(initial_date)
                        print 'Date : ' + row[1]
                    else:
                        print 'Else'
                        initial_date = initial_date + day                                                                                           

based on my current logic I'm running into this issue -

  1. As you can see in the sample csv there are 7 rows for 2015-06-01, 6 rows for 2015-06-02 and 3 rows for 2015-06-03.
  2. My output of above code is printing 7 values for 2015-06-01, 5 for 2015-06-02 and 2 for 2015-06-03

Calling the function using sum_orders_test('2015-06-01','2015-06-03');

I know there is some silly logical issue, but being new to programming and python I'm unable to figure it out.

4
  • 1
    delimiter=',')... Please tell me where the commas in the file are Commented Sep 3, 2017 at 8:22
  • its a csv file, and hence used ',', but its not there in file. Commented Sep 3, 2017 at 8:24
  • 1
    Have you tried using pandas? Commented Sep 3, 2017 at 8:24
  • That's exactly your problem... Python does not care about file extensions. Change the delimeter so you can actually read the data correctly Commented Sep 3, 2017 at 8:25

3 Answers 3

2

I've re-read the question, and if your data is really tab-separated, here's the following source to do the job (using pandas):

import pandas as pd

df = pd.DataFrame(pd.read_csv('file.csv', names=['order_total', 'created_datetime'], sep='\t'))
df['created_datetime'] = pd.to_datetime(df.created_datetime).dt.date
df = df.groupby(['created_datetime']).sum()
print(df)

Gives the following result:

                  order_total
created_datetime             
2015-06-01             261.16
2015-06-02             415.76
2015-06-03             132.92

Less codes, and probably lower algorithm complexity.

Sign up to request clarification or add additional context in comments.

3 Comments

It loks much easier, but my file is a csv file, although there isn't any tab or comma in the file. its a normal excel file saved as csv When I replace the '\t' with ',' and run I get below error df['created_datetime'] = pd.to_datetime(df.created_datetime).dt.date File "/Library/Python/2.7/site-packages/pandas/core/tools/datetimes.py", line 509, in to_datetime values = _convert_listlike(arg._values, False, format) File "/Library/Python/2.7/site-packages/pandas/core/tools/datetimes.py", line 447, in _convert_listlike raise e ValueError: Unknown string format @Abien
Will you please give a link to a sample of your data?
It certainly is :)
0

This one should do the job.

csv module has DictReader, in which you can include fieldnames so instead of accessing columns by index (row[0]), you can predefine columns names(row['date']).

from datetime import datetime, timedelta
from collections import defaultdict


def sum_orders_test(self, start_date, end_date):
    FIELDNAMES = ['orders', 'date']
    sum_of_orders = defaultdict(int)

    initial_date = datetime.strptime(start_date, '%Y-%m-%d').date()
    final_date = datetime.strptime(end_date, '%Y-%m-%d').date()
    day = timedelta(days=1)
    with open("file1.csv", 'r') as data_file:
        next(data_file)  # Skip the headers
        reader = csv.DictReader(data_file, fieldnames=FIELDNAMES)
        if initial_date <= final_date:
            for row in reader:
                if str(initial_date) in row['date']:
                    sum_of_orders[str(initial_date)] += int(row['orders'])
                else:
                    initial_date += day
    return sum_of_orders

2 Comments

How does defaultdict work ? When I try to print sum_of_orders it shows defaultdict(<type 'int'>, {}) @Pythonist
Simply saying, it allows you to add new keys to a dictionary, of given type, without checking if they're in. Docs will say more than I can.
0

You might have a .csv file extension, but your file seems to be a tab separated file actually.

You can load it as pandas dataframe but specifying the separator.

import pandas as pd
data = pd.read_csv('file.csv', sep='\t')

Then split the datetime column into date and time

data = pd.DataFrame(data.created_datetime.str.split(' ',1).tolist(),
                               columns = ['date','time'])

Then for each unique date, compute it's order_total sum

for i in data.date.unique():
    print i, data[data['date'] == i].order_total.sum()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.