Python For loop only runs once on CSV

Question

import csv
import sys

source = csv.DictReader(open('source.csv'))
export = csv.DictReader(open('export.csv'))
sys.stdout = open('output.csv','w')
val = 0

def output():
   for row in source:
        val = row['SKU'] 
        for row in export:
            if row['SKU'] == val:
                print '"' + row['SKU'] + '"' + ',' + '"' + row['DESC']  + '"' +  ',' + '"' +  row['COST'] + '"' +  ',' + '"' + row['MSRP'] + '"' + ',' + '"' + row['CORE'] + '"' +  ',' + '"' + row['WEIGHT'] + '"' + ',' + '"' + row['HEIGHT'] + '"' + ',' + '"' + row['LENGTH'] + '"' + ',' + '"' + row['WIDTH'] + '"' 
        else: 
            continue

output()

This grabs just the first SKU in the source file. And not for all 15000 skus in the source file. Formatting is correct. Since this is built on a code that functions for filtering using information from the export file only, (no source csv) I fell like my problem is within the second for loop, I'm not well versed enough to troubleshoot it though.

You cannot loop over files again and again, no, because once the read position reaches the end you can't read more. You'd have to explicitly put the read position back to 0. But that's a very poor and slow method. — Martijn Pieters
– Martijn Pieters, Commented Sep 2, 2015 at 15:44

Martijn Pieters · Accepted Answer · 2015-09-02 15:57:50Z

2

You cannot loop over files again and again, no, because once the read position reaches the end you can't read more. You'd have to explicitly put the read position back to 0, using a file.seek() call on the underlying file object. But that's a very poor and slow method.

Store your export data in a dictionary instead, so you can look up the matching SKU in constant time:

fields = ('SKU', 'DESC', 'COST', 'MSRP', 'CORE', 'WEIGHT', 'HEIGHT', 'LENGTH', 'WIDTH')

with open('export.csv', 'rb') as export:
    # store just the columns the output needs
    exports = {row['SKU']: row for row in csv.DictReader(export)}

with open('source.csv', 'rb') as source, open('output.csv', 'wb') as output:
    reader = csv.DictReader(source)
    writer = csv.DictWriter(
        output, quoting= csv.QUOTE_ALL,
        fieldnames=fields, extrasaction='ignore')
    for row in reader:
        if row['SKU'] in exports:
            writer.writerow(exports[row['SKU']])

Now you only need to iterate over the input CSV files once. I used a csv.DictWriter() object to produce the output, rather than printing. By setting the quoting option to csv.QUOTE_ALL you get quoted columns, always.

The fieldnames parameter tells the DictWriter() what fields to take from the dictionary (produced by the DictReader() used to read the exports CSV file), and the extrasaction option defines what to do with extra keys in that dictionary (we ignore those here).

edited Sep 2, 2015 at 15:57

answered Sep 2, 2015 at 15:49

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Vivek Sable Over a year ago

Great, Can you use list comprehension and csv.writerows ? e.g. import_rows = [exports[row['SKU']] for row in reader if row['SKU'] in exports] writer.writerows(import_rows)

Corey Lloyd Over a year ago

Worked great, thank you! This will be the foundation of a useful tool. Eventually Python will completely replace Excel for me, it's so much faster.

Martijn Pieters Over a year ago

@VivekSable: you could use a generator expression there, except that DictWriter.writerows() materialises the whole list in memory first. It is not all that efficient in that respect.

Collectives™ on Stack Overflow

Python For loop only runs once on CSV

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related