Merge 2 csv files with python

Question

I have 2 csv files as following:

File1.csv:

Name, Email
Jon, [email protected]
Roberto, [email protected]
Mona, [email protected]
James, [email protected]

File2.csv:

Email
[email protected]
[email protected]

What I want is File1.csv without File2.csv, iex File3.csv (the output) should look as following:

File3.csv:

Name, Email
Jon, [email protected]
Roberto, [email protected]

What is the simplest way to code this in Python?

File3.csv happens to be a subset of File1.csv so why would you need to merge? — Moses Koledoye
– Moses Koledoye, Commented Jan 25, 2016 at 14:23
A simple way is to read file2 into a list, then read file1 line by line and write every line where the email is not in the list into file3. Try coding it, if you get stuck, post your code and ask for help. — 576i
– 576i, Commented Jan 25, 2016 at 14:25
It has already been answered a ton of time. Show some code to work with or else you probably will get flaged as duplicate. — Cyrbil
– Cyrbil, Commented Jan 25, 2016 at 14:25

Gábor Erdős · Accepted Answer · 2016-01-25 14:50:51Z

1

dont_need_em = []
with open("file2.csv", 'r') as fn:
    for line in fn:
        if not line.startswith("Email"):
            dont_need_em.append(line.rstrip())

fw = open("file3.csv", 'w')

with open("file1.csv", 'r') as fn:
    for line in fn:
        if line.rstrip().split(", ")[1] not in dont_need_em: 
            fw.write(line.rstrip())
fw.close()

This should do it, but i am sure there are way simpler solutions

EDIT: Create the third file

edited Jan 25, 2016 at 14:50

answered Jan 25, 2016 at 14:24

Gábor Erdős

3,6894 gold badges28 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Owen Hempel Over a year ago

See my answer below, I used essentially exactly the same method

Alex · Accepted Answer · 2016-01-25 15:05:44Z

1

Using Pandas you can do this:

import pandas as pd
#Read two files into data frame using column names from first row
file1=pd.read_csv('File1.csv',header=0,skipinitialspace=True)
file2=pd.read_csv('File2.csv',header=0,skipinitialspace=True)

#Only return lines in file 1 if the email is not contained in file 2
cleaned=file1[~file1["Email"].isin(file2["Email"])]

#Output file to CSV with original headers
cleaned.to_csv("File3.csv", index=False)

edited Jan 25, 2016 at 15:05

answered Jan 25, 2016 at 15:00

Alex

21.8k11 gold badges68 silver badges77 bronze badges

Comments

Owen Hempel · Accepted Answer · 2016-01-25 14:47:01Z

Here's a good way to do that (it's very similar to the above, but writes the remainder to file rather than printing:

Removed = []
with open("file2.csv", 'r') as f2:
    for line in f2:
        if not line.startswith("Email"):
           removed.append(line.rstrip())


with open("file1.csv", 'r') as f1:
    with open("file3.csv", 'w') as f3:
        for line in f1:
            if line.rstrip().split(", ")[1] not in removed:
                f3.write(line)

How this works: The first block reads all the emails you want to filter out into a list. Next, the second block opens your original file and sets up a new file to write what's left. It reads each line from your first file and writes them to the third file only if the email isn't in your list to filter

Ali SAID OMAR · Accepted Answer · 2016-01-25 15:17:37Z

0

If you are under UNIX:

#! /usr/bin/env python
import subprocess
import sys

def filter(input_file, filter_file, out_file):
    subprocess.call("grep -f '%s' '%s' > '%s' " % (filter_file, input_file, out_file), shell=True)

answered Jan 25, 2016 at 15:17

Ali SAID OMAR

6,8629 gold badges42 silver badges57 bronze badges

Comments

Martin Evans · Accepted Answer · 2016-01-25 15:27:41Z

The following should do what you are looking for. First read File2.csv into a set of email addresses to be skipped. Then read File1.csv row by row, writing only rows which are not in the skip list:

import csv

with open('File2.csv', 'r') as file2:
    skip_list = set(line.strip() for line in file2.readlines()[1:])

with open('File1.csv', 'rb') as file1, open('File3.csv', 'wb') as file3:
    csv_file1 = csv.reader(file1, skipinitialspace=True)
    csv_file3 = csv.writer(file3)
    csv_file3.writerow(next(csv_file1))    # Write the header line

    for cols in csv_file1:
        if cols[1] not in skip_list:
            csv_file3.writerow(cols)

This would give you the following output in File3.csv:

Name,Email
Jon,[email protected]
Roberto,[email protected]

Collectives™ on Stack Overflow

Merge 2 csv files with python

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related