0
import csv
import sys

source = csv.DictReader(open('source.csv'))
export = csv.DictReader(open('export.csv'))
sys.stdout = open('output.csv','w')
val = 0

def output():
   for row in source:
        val = row['SKU'] 
        for row in export:
            if row['SKU'] == val:
                print '"' + row['SKU'] + '"' + ',' + '"' + row['DESC']  + '"' +  ',' + '"' +  row['COST'] + '"' +  ',' + '"' + row['MSRP'] + '"' + ',' + '"' + row['CORE'] + '"' +  ',' + '"' + row['WEIGHT'] + '"' + ',' + '"' + row['HEIGHT'] + '"' + ',' + '"' + row['LENGTH'] + '"' + ',' + '"' + row['WIDTH'] + '"' 
        else: 
            continue

output()

This grabs just the first SKU in the source file. And not for all 15000 skus in the source file. Formatting is correct. Since this is built on a code that functions for filtering using information from the export file only, (no source csv) I fell like my problem is within the second for loop, I'm not well versed enough to troubleshoot it though.

1
  • You cannot loop over files again and again, no, because once the read position reaches the end you can't read more. You'd have to explicitly put the read position back to 0. But that's a very poor and slow method. Commented Sep 2, 2015 at 15:44

1 Answer 1

2

You cannot loop over files again and again, no, because once the read position reaches the end you can't read more. You'd have to explicitly put the read position back to 0, using a file.seek() call on the underlying file object. But that's a very poor and slow method.

Store your export data in a dictionary instead, so you can look up the matching SKU in constant time:

fields = ('SKU', 'DESC', 'COST', 'MSRP', 'CORE', 'WEIGHT', 'HEIGHT', 'LENGTH', 'WIDTH')

with open('export.csv', 'rb') as export:
    # store just the columns the output needs
    exports = {row['SKU']: row for row in csv.DictReader(export)}

with open('source.csv', 'rb') as source, open('output.csv', 'wb') as output:
    reader = csv.DictReader(source)
    writer = csv.DictWriter(
        output, quoting= csv.QUOTE_ALL,
        fieldnames=fields, extrasaction='ignore')
    for row in reader:
        if row['SKU'] in exports:
            writer.writerow(exports[row['SKU']])

Now you only need to iterate over the input CSV files once. I used a csv.DictWriter() object to produce the output, rather than printing. By setting the quoting option to csv.QUOTE_ALL you get quoted columns, always.

The fieldnames parameter tells the DictWriter() what fields to take from the dictionary (produced by the DictReader() used to read the exports CSV file), and the extrasaction option defines what to do with extra keys in that dictionary (we ignore those here).

Sign up to request clarification or add additional context in comments.

3 Comments

Great, Can you use list comprehension and csv.writerows ? e.g. import_rows = [exports[row['SKU']] for row in reader if row['SKU'] in exports] writer.writerows(import_rows)
Worked great, thank you! This will be the foundation of a useful tool. Eventually Python will completely replace Excel for me, it's so much faster.
@VivekSable: you could use a generator expression there, except that DictWriter.writerows() materialises the whole list in memory first. It is not all that efficient in that respect.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.