Merge Two CSV files in Python

Question

I have two csv files and I want to create a third csv from the a merge of the two. Here's how my files look:

Num | status
1213 | closed
4223 | open
2311 | open

and another file has this:

Num | code
1002 | 9822
1213 | 1891
4223 | 0011

So, here is my little code that I was trying to loop through but it does not print the output with the third column added matching the correct values.

def links():
    first = open('closed.csv')
    csv_file = csv.reader(first)

    second = open('links.csv')
    csv_file2 = csv.reader(second)

    for row in csv_file:  
        for secrow in csv_file2:                             
            if row[0] == secrow[0]:
                print row[0]+"," +row[1]+","+ secrow[0]
                time.sleep(1)

so what I want is something like:

Zenadix · Accepted Answer · 2015-08-21 20:04:42Z

5

If you decide to use pandas, you can do it in only five lines.

import pandas as pd

first = pd.read_csv('closed.csv')
second = pd.read_csv('links.csv')

merged = pd.merge(first, second, how='left', on='Num')
merged.to_csv('merged.csv', index=False)

edited Aug 21, 2015 at 20:04

answered Aug 21, 2015 at 19:53

Zenadix

16.4k5 gold badges29 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Helen Neely Over a year ago

Thanks for the code, it worked but someone above bit you to it. Maybe next time

Marilu Over a year ago

This worked great, I was using merge but I was losing the lines that didn't have the same index and the index. This way it is merging the ones that match and keeping the ones that don't and also keep the index :-)

Cody Braun · Accepted Answer · 2015-08-21 15:41:21Z

4

This is definitely a job for pandas. You can easily read in both csv files as DataFrames and use either merge or concat. It'll be way faster and you can do it in just a few lines of code.

answered Aug 21, 2015 at 15:41

Cody Braun

6577 silver badges19 bronze badges

2 Comments

Helen Neely Over a year ago

Thanks, I will investigate pandas at the weekend. but is there a way to achieve this without pandas?

Zenadix Over a year ago

Also, with pandas you can handle larger files with less memory.

Leonid Glanz · Accepted Answer · 2015-08-21 15:59:44Z

1

The problem is that you could iterate over a csv reader only once, so that csv_file2 does not work after the first iteration. To solve that you should save the output of csv_file2 and iterate over the saved list. It could look like that:

import time, csv


def links():
    first = open('closed.csv')
    csv_file = csv.reader(first, delimiter="|")


    second = open('links.csv')
    csv_file2 = csv.reader(second, delimiter="|")

    list=[]
    for row in csv_file2:
        list.append(row)


    for row in csv_file:
        match=False  
        for secrow in list:                             
            if row[0].replace(" ","") == secrow[0].replace(" ",""):
                print row[0] + "," + row[1] + "," + secrow[1]
                match=True
        if not match:
            print row[0] + "," + row[1] + ", blank no match" 
        time.sleep(1)

Output:

Num , status, code
1213 , closed, 1891
4223 , open, 0011
2311 , open, blank no match

answered Aug 21, 2015 at 15:59

Leonid Glanz

1,2612 gold badges16 silver badges37 bronze badges

3 Comments

Helen Neely Over a year ago

Thanks, this looks like a great approach. I'm currently on the train but will jump on it once I get home. I will definitely let you know if this works.

Leonid Glanz Over a year ago

Does this answer your question?

Helen Neely Over a year ago

Thanks, this worked like a charm. Sorry I couldn't get back sooner. Thanks again and God bless.

CCKx · Accepted Answer · 2015-08-21 15:58:24Z

1

You could read the values of the second file into a dictionary and then add them to the first.

Code = {}
for row in csv_file2:
    Code[row[0]] = row[1]

for row in csv_file1:
    row.append(Code.get(row[0], "blank no match"))

answered Aug 21, 2015 at 15:58

CCKx

1,34310 silver badges22 bronze badges

Comments

skrrgwasme · Accepted Answer · 2015-08-21 16:37:54Z

This code will do it for you:

import csv

def links():

    # open both files
    with open('closed.csv') as closed, open('links.csv') as links:

        # using DictReader instead to be able more easily access information by num
        csv_closed = csv.DictReader(closed)
        csv_links = csv.DictReader(links)

         # create dictionaries out of the two CSV files using dictionary comprehensions
        num_dict = {row['num']:row['status'] for row in csv_closed}
        link_dict = {row['num']:row['code'] for row in csv_links}   

    # print header, each column has width of 8 characters
    print("{0:8} | {1:8} | {2:8}".format("Num", "Status", "Code"))

    # print the information
    for num, status in num_dict.items():

        # note this call to link_dict.get() - we are getting values out of the link dictionary,
        # but specifying a default return value of an empty string if num is not found in it
        # to avoid an exception
        print("{0:8} | {1:8} | {2:8}".format(num, status, link_dict.get(num, '')))

links()

In it, I'm taking advantage of dictionaries, which let you access information by keys. I'm also using implicit loops (the dictionary comprehensions) which tend to be faster and require less code.

There are two quirks of this code that you should be aware of, that your example suggests are fine:

Order is not preserved (because we're using dictionaries)
Num entries that are in links.csv but not closed.csv are not included in the printout

Last note: I made some assumptions about how your input files are formatted since you called them "CSV" files. This is what my input files looked like for this code:

closed.csv

num,status
1213,closed
4223,open
2311,open

links.csv

num,code
1002,9822
1213,1891
4223,0011

Given those input files, the result looks like this:

Num      | Status   | Code  
1213     | closed   | 1891  
2311     | open     |  
4223     | open     | 0011

Thanks for the code, it worked but someone above bit you to it. Maybe next time :)

Collectives™ on Stack Overflow

Merge Two CSV files in Python

5 Answers 5

2 Comments

2 Comments

3 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

2 Comments

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related