3

Hi I am trying to create a new CSV file from merge of specific fields in two CSV files based on a common column or primary key. I have tried doing the same thing in powershell and it worked but was very slow in completing the process - more than 30 minutes for merging 5000+ line files so trying this in Python. I am new so please go easy on me.

So two files are infile.csv and checkfile.csv and the columns in the output file created would be based on columns in infile.csv. The code checks the values in checkfile.csv, creates outfile.csv, copies columns from infile.csv and needs to rewrite values for two fields based on corresponding values in checkfile.com. Following are the details

infile.csv -

"StockNumber","SKU","ChannelProfileID","CostPrice"
"10m_s-vid#APTIIAMZ","2VV-10",3746,0.33
"10m_s-vid#CSE","2VV-10",3746,0.98
"1RR-01#CSE","1RR-01",3746
"1RR-01#PCAWS","1RR-01",3746,
"1m_s-vid_ext#APTIIAMZ","2VV-101",3746,0.42

checkfile.csv

ProductCode, Description, Supplier, CostPrice, RRPPrice, Stock, Manufacturer, SupplierProductCode, ManuCode, LeadTime
2VV-03,3MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.43,,930,CDL,2VV-03,2VV-03,1
2VV-05,5MTR BLACK SVHS M - M GOLD CABLE - B/Q 100,Cables Direct Ltd,0.54,,1935,CDL,2VV-05,2VV-05,1
2VV-10,10MTR BLACK SVHS M - M GOLD CABLE - B/Q 50,Cables Direct Ltd,0.86,,1991,CDL,2VV-10,2VV-10,1

The outfile.csv I am getting is -

StockNumber,SKU,ChannelProfileID,CostPrice
10m_s-vid#APTIIAMZ,2VV-10,"(' ',)", 
10m_s-vid#CSE,2VV-10,"(' ',)", 
1RR-01#CSE,1RR-01,"(' ',)", 
1RR-01#PCAWS,1RR-01,"(' ',)", 
1m_s-vid_ext#APTIIAMZ,2VV-101,"(' ',)", 

But the outfile.csv I need is -

StockNumber,SKU,ChannelProfileID,CostPrice
10m_s-vid#APTIIAMZ,2VV-10,1991,0.86  
10m_s-vid#CSE,2VV-10,1991,0.86   
1RR-01#CSE,1RR-01
1RR-01#PCAWS,1RR-01          
1m_s-vid_ext#APTIIAMZ,2VV-101

Finally the code -

import csv

with open('checkfile.csv', 'rb') as checkfile:
    checkreader = csv.DictReader(checkfile)

    product_result = dict(
        ((v['ProductCode'], v[' Stock']), (v['ProductCode'], v[' CostPrice']))  for v in checkreader
    )

with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)

        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()

        for item in reader:
            result = product_result.get(item['SKU'], " ")

            item['ChannelProfileID'] = result,
            item['CostPrice'] = result

            writer.writerow(item)
4
  • It is not clear what your problem is. Also it is not clear what the desired result should look like. Commented Dec 6, 2012 at 0:18
  • Also, your infile headers define 4 fields, yet below there are only 3. Commented Dec 6, 2012 at 0:38
  • Ok, added the expected outfile.csv right now. As you can see the ChannelProfileID and CostPrice items should be populated but they are not. Commented Dec 6, 2012 at 0:39
  • The CostPrice column in infile.csv is empty and has no values, but I will add values there for better example. Commented Dec 6, 2012 at 0:57

3 Answers 3

3

You could make it a little simpler:

import csv

with open('checkfile.csv', 'rb') as checkfile:
    product_result = {
        record['ProductCode']: record for record in csv.DictReader(checkfile)}

with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)
        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()
        for item in reader:
            record = product_result.get(item['SKU'], None)
            if record:
                item['ChannelProfileID'] = record[' Stock']  # ???
                item['CostPrice'] = record[' CostPrice']
            else:
                item['ChannelProfileID'] = None
                item['CostPrice'] = None
            writer.writerow(item)

I wasn't sure about the line which I commented with ???.

Also, if you really want to produce broken CSV, please feel free to omit the else-clause.

I tested it with StringIO objects. It produced the result you specified, but with trailing commas, where there was no match in checkfile.

And I used Python 2.7 dict comprehension, since you tagged your question with python-2.7.

Sign up to request clarification or add additional context in comments.

Comments

1
import csv

product_result = {}

with open('checkfile.csv', 'rb') as checkfile:
    checkreader = csv.DictReader(checkfile)

    for v in checkreader:
        product_result[v['ProductCode']] = (v[' Stock'], v[' CostPrice'])

with open('infile.csv', 'rb') as infile:
    with open('outfile.csv', 'wb') as outfile:
        reader = csv.DictReader(infile)
        writer = csv.DictWriter(outfile, reader.fieldnames)
        writer.writeheader()

        for item in reader:
            result = product_result.get(item['SKU'])
            if result:
               item['ChannelProfileID'], item['CostPrice'] = result
            else:
               item['ChannelProfileID'] = item['CostPrice'] = None

            writer.writerow(item)

4 Comments

Thanks for replying - so I get that the infile data into tuples. But how do I update the value from the dictionary for the ' Stock' field to ChannelProfileID and further, update value ' CostPrice' to CostPrice in outfile.csv?
To continue, would it be something like item['ChannelProfileID'] = result['Stock'] Basically trying to write data from Dictionary to specific CSV fields
result is a tuple so you can only use integers for its indices; what I've done in this instance is sequence unpacking.
I'm glad to hear that. Please remember to mark the best answer as the accepted answer for your question. :)
0
import re
import csv
import collections
import glob

# Variables

total_record = []
headerCount = 0

for file in glob.glob("*.csv"):
    print(file)

    with open(file, 'r') as f:
        reader = csv.reader(f)
        list_record = list(reader)
        if headerCount == 0:
            headerCount = 1
            total_record.extend(list_record)
        else:
            list_record.pop(0)
            total_record.extend(list_record)

with open('combine.csv', 'w') as csvFile:
    writer = csv.writer(csvFile)
    writer.writerows(total_record)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.