1
Name,USAF,NCDC,Date,HrMn,I,Type,Dir,Q,I,Spd,Q

OXNARD,723927,93110,19590101,0000,4,SAO,270,1,N,3.1,1, 
OXNARD,723927,93110,19590101,0100,4,SAO,338,1,N,1.0,1, 
OXNARD,723927,93110,19590101,0200,4,SAO,068,1,N,1.0,1, 
OXNARD,723927,93110,19590101,0300,4,SAO,068,1,N,2.1,1, 
OXNARD,723927,93110,19590101,0400,4,SAO,315,1,N,1.0,1, 
OXNARD,723927,93110,19590101,0500,4,SAO,999,1,C,0.0,1, 
....

OXNARD,723927,93110,19590102,0000,4,SAO,225,1,N,2.1,1,
OXNARD,723927,93110,19590102,0100,4,SAO,248,1,N,2.1,1,
OXNARD,723927,93110,19590102,0200,4,SAO,999,1,C,0.0,1,
OXNARD,723927,93110,19590102,0300,4,SAO,068,1,N,2.1,1,

Here is a snippet of a csv file storing hourly wind speeds (Spd) in each row. What I'd like to do is select all hourly winds for each day in the csv file and store them into a temporary daily list storing all of that day's hourly values (24 if no missing values). Then I'll output the current day's list, create new empty list for the next day, locate hourly speeds in the next day, output that daily list, and so forth until the end of the file.

I'm struggling with a good method to do this. One thought I have is to read in line i, determine the date(YYYY-MM-DD), then read in line i+1 and see if that date matchs date i. If they match, then we're in the same day. If they don't, then we are onto the next day. But I can't even figure out how to read in the next line in the file...

Any suggestions to execute this method or a completely new (and better?!) method are most welcome. Thanks you in advance!

obs_in  = open(csv_file).readlines()
for i in range(1,len(obs_in)):        
# Skip over the header lines 
if not str(obs_in[i]).startswith("Identification") and not str(obs_in[i]).startswith("Name"):
    name,usaf,ncdc,date,hrmn,i,type,dir,q,i2,spd,q2,blank = obs_in[i].split(',')
    current_dt  = datetime.date(int(date[0:4]),int(date[4:6]),int(date[6:8]))
    current_spd = spd 
    # Read in next line's date: is it in the same day?
    # If in the same day, then append spd into tmp daily list 
    # If not, then start a new list for the next day 
4
  • have a list and store lines until date change. when date changes, dump what's in list to file, refresh the list, then move on Commented Dec 17, 2011 at 22:02
  • so at the end, do you want to have a whole bunch of files with 24 lines, and filename becomes something like spd19590101.csv, spd19590102.csv etc? Commented Dec 17, 2011 at 22:20
  • how can i mark when the date changes? i don't know how to read in the next line and extract that date to see if it's different that the previous line's date. ultimately, i want one list of 24 values per date (YYYYMMDD), output that list, then move onto the next day, have a new empty list, populate the new list with the next 24 values, output it... Commented Dec 17, 2011 at 22:38
  • you dont read the next line. you just read and put the data into buffer, but remember the date of the previous line. then when you process new line, compare the date of previous line. when date changed. you flush the buffer to file, clear the buffer, then start storing lines to buffer. Commented Dec 17, 2011 at 22:42

3 Answers 3

2

You can take advantage of the well-ordered nature of the data file and use csv.dictreader. Then you can build up a dictionary of the windspeeds organized by date quite simply, which you can process as you like. Note that the csv reader returns strings, so you might want to convert to other types as appropriate while you assemble the list.

import csv
from collections import defaultdict
bydate = defaultdict(list)
rdr = csv.DictReader(open('winds.csv','rt'))
for k in rdr:
    bydate[k['Date']].append(float(k['Spd']))

print(bydate)
defaultdict(<type 'list'>, {'19590101': [3.1000000000000001, 1.0, 1.0, 2.1000000000000001, 1.0, 0.0], '19590102': [2.1000000000000001, 2.1000000000000001, 0.0, 2.1000000000000001]})

You can obviously change the argument to the append call to a tuple, for instance append((float(k['Spd']), datetime.datetime.strptime(k['Date']+k['HrMn'],'%Y%m%D%H%M)) so that you can also collect the times.

If the file has extraneous spaces, you can use the skipinitialspace parameter: rdr = csv.DictReader(open('winds.csv','rt'), fieldnames=ff, skipinitialspace=True). If this still doesn't work, you can pre-process the header line:

bydate = defaultdict(list)
with open('winds.csv', 'rt') as f:
    fieldnames = [k.strip() for k in f.readline().split(', ')]
    rdr = csv.DictReader(f, fieldnames=fieldnames, skipinitialspace=True)
    for k in rdr:
        bydate[k['Date']].append(k['Spd'])
return bydate

bydate is accessed like a regular dictionary. To access a specific day's data, do bydate['19590101']. To get the list of dates that were processed, you can do bydate.keys().

If you want to convert them to Python datetime objects at the time of reading the file, you can import datetime, then replace the assignment line with bydate[datetime.datetime.strptime(k['Date'], '%Y%m%d')].append(k['Spd']).

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the suggestion, mtrw! Follow up question: I have some trailing and leading white space in the actual csv file (I deleted them manually when pasting the snippet above), so that in order for the above script to work, line 6 needs to be: bydate[k['Date ']].append(k[' Spd']). How can I remove the white space in the read-in, so I can just use 'Date' and 'Spd' in line 6?
Also, how do you then extract the speeds for just 19590101, for example? (I'm a total newbie to DictReader)
Skipinitialspace=True appears to only remove leading whitespace - is there a corresponding command to remove both trailing and leading whitespace?
Did you try splitting fieldname line and stripping the whitespace, as shown in the second example?
The second example works perfectly; I realized that my header line was not separated by commas, so I changed the line to read .split(). Speeds are stored as a list, but I can't take the average of them. TypeError: cannot perform reduce with flexible type dates = bydate.keys() for dt in dates: mean_spd = mean(bydate[dt])
|
1

It can be something like this.

def dump(buf, date):
    """dumps buffered line into file 'spdYYYYMMDD.csv'"""
    if len(buf) == 0: return
    with open('spd%s.csv' % date, 'w') as f:
        for line in buf:
             f.write(line)

obs_in  = open(csv_file).readlines()
# buf stores one day record
buf = []
# date0 is meant for time stamp for the buffer
date0 = None
for i in range(1,len(obs_in)):        
    # Skip over the header lines 
    if not str(obs_in[i]).startswith("Identification") and \
        not str(obs_in[i]).startswith("Name"):
        name,usaf,ncdc,date,hrmn,ii,type,dir,q,i2,spd,q2,blank = \
            obs_in[i].split(',')
        current_dt  = datetime.date(int(date[0:4]),int(date[4:6]),int(date[6:8]))
        current_spd = spd 
        # see if the time stamp of current record is different.  if it is different
        # dump the buffer, and also set the time stamp of buffer
        if date != date0:
            dump(buf, date0)
            buf = []
            date0 = date
        # you change this.  i am simply writing entire line
        buf.append(obs_in[i])

# when you get out the buffer should be filled with the last day's record.  
# so flush that too.
dump(buf, date0)

I also found that i have to use ii instead of i for the filed "I" of the data, as you used i for loop counter.

Comments

0

I know this question is from years ago but just wanted to point out that a small bash script can neatly perform this task. I copied your example into a file called data.txt and this is the script:

#!/bin/bash
date=19590101
date_end=19590102
while [[ $date -le $date_end ]] ; do
  grep ",${date}," data.txt > file_${date}.txt
  date=`date +%Y%m%d -d ${date}+1day` # NOTE: MAC-OSX date differs
done

Note that this won't work on MAC as for some reason the date command implementation is different, so on MAC you either need to use gdate (from coreutils) or change the options to match those for date on MAC.

If there are dates missing from the file the grep command produces an empty file - this link shows ways to avoid this: how to stop grep creating empty file if no results

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.