2

I have two csv files and I want to create a third csv from the a merge of the two. Here's how my files look:

Num | status
1213 | closed
4223 | open
2311 | open

and another file has this:

Num | code
1002 | 9822
1213 | 1891
4223 | 0011

So, here is my little code that I was trying to loop through but it does not print the output with the third column added matching the correct values.

def links():
    first = open('closed.csv')
    csv_file = csv.reader(first)

    second = open('links.csv')
    csv_file2 = csv.reader(second)

    for row in csv_file:  
        for secrow in csv_file2:                             
            if row[0] == secrow[0]:
                print row[0]+"," +row[1]+","+ secrow[0]
                time.sleep(1)

so what I want is something like:

Num | status | code
1213 | closed | 1891
4223 | open | 0011
2311 | open | blank no match

5 Answers 5

5

If you decide to use pandas, you can do it in only five lines.

import pandas as pd

first = pd.read_csv('closed.csv')
second = pd.read_csv('links.csv')

merged = pd.merge(first, second, how='left', on='Num')
merged.to_csv('merged.csv', index=False)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the code, it worked but someone above bit you to it. Maybe next time
This worked great, I was using merge but I was losing the lines that didn't have the same index and the index. This way it is merging the ones that match and keeping the ones that don't and also keep the index :-)
4

This is definitely a job for pandas. You can easily read in both csv files as DataFrames and use either merge or concat. It'll be way faster and you can do it in just a few lines of code.

2 Comments

Thanks, I will investigate pandas at the weekend. but is there a way to achieve this without pandas?
Also, with pandas you can handle larger files with less memory.
1

The problem is that you could iterate over a csv reader only once, so that csv_file2 does not work after the first iteration. To solve that you should save the output of csv_file2 and iterate over the saved list. It could look like that:

import time, csv


def links():
    first = open('closed.csv')
    csv_file = csv.reader(first, delimiter="|")


    second = open('links.csv')
    csv_file2 = csv.reader(second, delimiter="|")

    list=[]
    for row in csv_file2:
        list.append(row)


    for row in csv_file:
        match=False  
        for secrow in list:                             
            if row[0].replace(" ","") == secrow[0].replace(" ",""):
                print row[0] + "," + row[1] + "," + secrow[1]
                match=True
        if not match:
            print row[0] + "," + row[1] + ", blank no match" 
        time.sleep(1)

Output:

Num , status, code
1213 , closed, 1891
4223 , open, 0011
2311 , open, blank no match

3 Comments

Thanks, this looks like a great approach. I'm currently on the train but will jump on it once I get home. I will definitely let you know if this works.
Does this answer your question?
Thanks, this worked like a charm. Sorry I couldn't get back sooner. Thanks again and God bless.
1

You could read the values of the second file into a dictionary and then add them to the first.

Code = {}
for row in csv_file2:
    Code[row[0]] = row[1]

for row in csv_file1:
    row.append(Code.get(row[0], "blank no match"))

Comments

1

This code will do it for you:

import csv

def links():

    # open both files
    with open('closed.csv') as closed, open('links.csv') as links:

        # using DictReader instead to be able more easily access information by num
        csv_closed = csv.DictReader(closed)
        csv_links = csv.DictReader(links)

         # create dictionaries out of the two CSV files using dictionary comprehensions
        num_dict = {row['num']:row['status'] for row in csv_closed}
        link_dict = {row['num']:row['code'] for row in csv_links}   

    # print header, each column has width of 8 characters
    print("{0:8} | {1:8} | {2:8}".format("Num", "Status", "Code"))

    # print the information
    for num, status in num_dict.items():

        # note this call to link_dict.get() - we are getting values out of the link dictionary,
        # but specifying a default return value of an empty string if num is not found in it
        # to avoid an exception
        print("{0:8} | {1:8} | {2:8}".format(num, status, link_dict.get(num, '')))

links()

In it, I'm taking advantage of dictionaries, which let you access information by keys. I'm also using implicit loops (the dictionary comprehensions) which tend to be faster and require less code.

There are two quirks of this code that you should be aware of, that your example suggests are fine:

  1. Order is not preserved (because we're using dictionaries)
  2. Num entries that are in links.csv but not closed.csv are not included in the printout

Last note: I made some assumptions about how your input files are formatted since you called them "CSV" files. This is what my input files looked like for this code:

closed.csv

num,status
1213,closed
4223,open
2311,open

links.csv

num,code
1002,9822
1213,1891
4223,0011

Given those input files, the result looks like this:

Num      | Status   | Code  
1213     | closed   | 1891  
2311     | open     |  
4223     | open     | 0011  

1 Comment

Thanks for the code, it worked but someone above bit you to it. Maybe next time :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.