6

I am working on a data analysis using a CSV file that I got from a datawarehouse(Cognos). The CSV file has the last row that sums up all the rows above, but I do not need this line for my analysis, so I would like to skip the last row.

I was thinking about adding "if" statement that checks a column name within my "for" loop like below.

import CSV

with open('COGNOS.csv', "rb") as f, open('New_COGNOS.csv', "wb") as w:
    #Open 2 CSV files. One to read and the other to save.
    CSV_raw = csv.reader(f)
    CSV_new = csv.writer(w)
    for row in CSV_raw:
        item_num = row[3].split(" ")[0]
        row.append(item_num)
        if row[0] == "All Materials (By Collection)": break
        CSV_new.writerow(row)

However, this looks like wasting a lot of resource. Is there any pythonian way to skip the last row when iterating through CSV file?

2
  • if your on ninx you can use head -n -1 yourfile.csv to echo file without the last line Commented May 30, 2013 at 21:54
  • Do you mean unix-lke OS? Unfortunately, I am using my corporate PC. Thank you though, it will come in handy when I get my hand dirty at home. Commented May 30, 2013 at 22:30

2 Answers 2

18

You can write a generator that'll return everything but the last entry in an input iterator:

def skip_last(iterator):
    prev = next(iterator)
    for item in iterator:
        yield prev
        prev = item

then wrap your CSV_raw reader object in that:

for row in skip_last(CSV_raw):

The generator basically takes the first entry, then starts looping and on each iteration yield the previous entry. When the input iterator is done, there is still one line left, that is never returned.

A generic version, letting you skip the last n elements, would be:

from collections import deque
from itertools import islice

def skip_last_n(iterator, n=1):
    it = iter(iterator)
    prev = deque(islice(it, n), n)
    for item in it:
        yield prev.popleft()
        prev.append(item)
Sign up to request clarification or add additional context in comments.

4 Comments

Martijn, seems like there is a team of python devs behind your account :) Producing that fast and exact answers looks just amazing!
Thank you Martijin. That was amazingly fast. The code works like a charm too. Except ":" at the end of "prev = next(iterator):" has to be deleted.
There we go! Thank you very much.
This is exactly how I'd to it too. In general, when you want to "look ahead," it is usually easier to change the problem to "look behind."
1

A generalized "skip-n" generator

from __future__ import print_function
from StringIO import StringIO
from itertools import tee
s = '''\
1
2
3
4
5
6
7
8
'''
def skip_last_n(iterator, n=1):
    a, b = tee(iterator)
    for x in xrange(n):
            next(a)
    for line in a:
            yield next(b)

i = StringIO(s)
for x in skip_last_n(i, 1):
    print(x, end='')
1
2
3
4
5
6
7

i = StringIO(s)
for x in skip_last_n(i, 3):
    print(x, end='')
1
2
3
4
5

3 Comments

Using tee as a n-sized buffer is a nice idea too. Use itertools.islice() to skip n items fast instead of a for x in xrange(n) loop: next(islice(a, n, n), None) consumes n items in C code, which will beat the for loop any time.
@MartijnPieters, good point. I am leaning towards leaving the for loop in place for readability reasons. Your comment should be able to point everyone to the more efficient islice option!
It is part of the consume recipe in the itertools documentation if you are interested.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.