Python's CSV reader and iteration

Question

I have a CSV file that looks like this:

"Company, Inc.",,,,,,,,,,,,10/30/09
A/R Summary Aged Analysis Report,,,,,,,,,,,,10:35:01
All Clients,,,,,,,,,,,,USER

Client Account,Customer Name,15-Jan,16 - 30,31 - 60,61 - 90,91 - 120,120 - Over,Total,Status,Credit Limit
1000001111,CLIENT A,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00"
1000002222,CLIENT B,0,0,0,"3,591.27",0,0,"3,591.27",COD,0
1000003333,CLIENT C,536.78,0,0,0,0,"11,216.60","11,753.38",COD,0
1000004444,CLIENT D,0,514.94,"3,147.45",690,0,0,"4,352.39",COD,0

Grand Total,,"139,203,856.06","84,607,749.30","110,746,640.18","58,474,379.45","52,025,869.06","292,653,734.82","737,712,228.87",,,,

But I only want to process the lines after the line "Client Account..." and before "Grand Total..." Here's the code that I'm using now:

inputFile = csv.reader(open(filename), dialect='excel')
records = [line for line in inputFile if line and line[0].isdigit()]

That works. What's the question?

S.Lott
– S.Lott

2010-01-07 11:28:50 +00:00
Commented Jan 7, 2010 at 11:28 — S.Lott
– S.Lott, Commented Jan 7, 2010 at 11:28

S.Lott · Accepted Answer · 2010-01-07 11:47:15Z

Via generators. You can build all kinds of complexity from simple generator-filter functions. While considerably more complex than your filter, this is more extensible and can easily handle really complex spreadsheets.

def skip_blank( rdr ):
    for row in rdr:
        if len(row) == 0: continue
        if all(len(col)==0 for col in row): continue
        yield row

def after_heading( text, rdr ):
    i= iter(rdr)
    for row in i:
        if any( column == text for column in row ):
            break
    for row in i:
        yield row

def before_footing( text, rdr ):
    for row in rdr:
        if any( column == text for column in row ):
            break
        yield row

def between( start, end, rdr ):
    for row in before_footing( end, after_heading( start, rdr ) ):
        yield row

for row in between( 'Grand Total', 'Client Account', skip_blank( inputFile ) ):
    print row

ghostdog74 · Accepted Answer · 2010-01-07 11:09:38Z

10

you can do it like this, by setting flag

import csv
file = "file"
f=0
reader = csv.reader(open(file),delimiter=',')
for row in reader:
    if "Grand Total" in row: break
    if "Client Account" in row: f=1;continue
    if f:
        if row[0].isdigit():
            print row

edited Jan 7, 2010 at 11:09

answered Jan 7, 2010 at 10:55

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

2 Comments

KevinDTimm Over a year ago

modify - <code>if "Grand Total" in row: break</code> and, I think that your continue will skip back to 'row in reader', never processing anything.

LWZ Over a year ago

I have a very similar question, my "Grand Total" row is not always "Grant Total", it could be other fields, but there's always a blank row before it. How can I break the loop by determining the blank row?

YOU · Accepted Answer · 2010-01-08 00:37:36Z

6

import re
import StringIO

data=re.search("Client Account[^\r\n]+[\r\n]+(.*)(?=Grand Total)",open(filename).read(),re.DOTALL).group(1)
datafile=StringIO.StringIO(data)

inputFile = csv.reader(datafile, dialect='excel')
records = [line for line in inputFile if line and line[0].isdigit()]

edited Jan 8, 2010 at 0:37

answered Jan 7, 2010 at 10:46

YOU

124k34 gold badges191 silver badges222 bronze badges

3 Comments

FrancisV Over a year ago

I like your approach, it's fast and simple. How do I convert the contents of datafile into a list?

FrancisV Over a year ago

Got this error message when I tried your suggestion: "TypeError: coercing to Unicode: need string or buffer, instance found"

YOU Over a year ago

sorry for delay, open(datafile) should be datafile only, its already file instance, updated.

Peter Hansen · Accepted Answer · 2010-01-07 11:15:42Z

Use a nice little generator for something like this. This one could be generalized a little more if your requirements change:

def lines_between(source, first, second):
    for line in source:
        if line and line[0] == first:
            break

    for line in source:
        if line: and line[0] == second:
            break

        if line:  # only non-empty lines
            yield line

for record in lines_between(inputFile, 'Client Account', 'Grand Total'):
    # process record

You didn't ask explicitly for the "non-empty lines" filter, but your own approach was doing this so I assume you wanted it. If you don't want to process the rows "lazily" like that, but just want a list with everything built in advance, do this:

records = list(lines_between(inputFile, 'Client Account', 'Grand Total'))

By the way, on Windows, be sure to open the real source file using binary mode, with csv.reader(open(filename, 'rb'), dialect='excel') as the csv docs note.

Collectives™ on Stack Overflow

Python's CSV reader and iteration

4 Answers 4

Comments

2 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related