13

I have a CSV file that looks like this:

"Company, Inc.",,,,,,,,,,,,10/30/09
A/R Summary Aged Analysis Report,,,,,,,,,,,,10:35:01
All Clients,,,,,,,,,,,,USER

Client Account,Customer Name,15-Jan,16 - 30,31 - 60,61 - 90,91 - 120,120 - Over,Total,Status,Credit Limit
1000001111,CLIENT A,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00"
1000002222,CLIENT B,0,0,0,"3,591.27",0,0,"3,591.27",COD,0
1000003333,CLIENT C,536.78,0,0,0,0,"11,216.60","11,753.38",COD,0
1000004444,CLIENT D,0,514.94,"3,147.45",690,0,0,"4,352.39",COD,0

Grand Total,,"139,203,856.06","84,607,749.30","110,746,640.18","58,474,379.45","52,025,869.06","292,653,734.82","737,712,228.87",,,,

But I only want to process the lines after the line "Client Account..." and before "Grand Total..." Here's the code that I'm using now:

inputFile = csv.reader(open(filename), dialect='excel')
records = [line for line in inputFile if line and line[0].isdigit()]
1
  • 1
    That works. What's the question? Commented Jan 7, 2010 at 11:28

4 Answers 4

12

Via generators. You can build all kinds of complexity from simple generator-filter functions. While considerably more complex than your filter, this is more extensible and can easily handle really complex spreadsheets.

def skip_blank( rdr ):
    for row in rdr:
        if len(row) == 0: continue
        if all(len(col)==0 for col in row): continue
        yield row

def after_heading( text, rdr ):
    i= iter(rdr)
    for row in i:
        if any( column == text for column in row ):
            break
    for row in i:
        yield row

def before_footing( text, rdr ):
    for row in rdr:
        if any( column == text for column in row ):
            break
        yield row

def between( start, end, rdr ):
    for row in before_footing( end, after_heading( start, rdr ) ):
        yield row

for row in between( 'Grand Total', 'Client Account', skip_blank( inputFile ) ):
    print row
Sign up to request clarification or add additional context in comments.

Comments

10

you can do it like this, by setting flag

import csv
file = "file"
f=0
reader = csv.reader(open(file),delimiter=',')
for row in reader:
    if "Grand Total" in row: break
    if "Client Account" in row: f=1;continue
    if f:
        if row[0].isdigit():
            print row

2 Comments

modify - <code>if "Grand Total" in row: break</code> and, I think that your continue will skip back to 'row in reader', never processing anything.
I have a very similar question, my "Grand Total" row is not always "Grant Total", it could be other fields, but there's always a blank row before it. How can I break the loop by determining the blank row?
6
import re
import StringIO

data=re.search("Client Account[^\r\n]+[\r\n]+(.*)(?=Grand Total)",open(filename).read(),re.DOTALL).group(1)
datafile=StringIO.StringIO(data)

inputFile = csv.reader(datafile, dialect='excel')
records = [line for line in inputFile if line and line[0].isdigit()]

3 Comments

I like your approach, it's fast and simple. How do I convert the contents of datafile into a list?
Got this error message when I tried your suggestion: "TypeError: coercing to Unicode: need string or buffer, instance found"
sorry for delay, open(datafile) should be datafile only, its already file instance, updated.
3

Use a nice little generator for something like this. This one could be generalized a little more if your requirements change:

def lines_between(source, first, second):
    for line in source:
        if line and line[0] == first:
            break

    for line in source:
        if line: and line[0] == second:
            break

        if line:  # only non-empty lines
            yield line

for record in lines_between(inputFile, 'Client Account', 'Grand Total'):
    # process record

You didn't ask explicitly for the "non-empty lines" filter, but your own approach was doing this so I assume you wanted it. If you don't want to process the rows "lazily" like that, but just want a list with everything built in advance, do this:

records = list(lines_between(inputFile, 'Client Account', 'Grand Total'))

By the way, on Windows, be sure to open the real source file using binary mode, with csv.reader(open(filename, 'rb'), dialect='excel') as the csv docs note.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.