1

I have 2 csv files as following:

File1.csv:

Name, Email
Jon, [email protected]
Roberto, [email protected]
Mona, [email protected]
James, [email protected]

File2.csv:

Email
[email protected]
[email protected]

What I want is File1.csv without File2.csv, iex File3.csv (the output) should look as following:

File3.csv:

Name, Email
Jon, [email protected]
Roberto, [email protected]

What is the simplest way to code this in Python?

3
  • File3.csv happens to be a subset of File1.csv so why would you need to merge? Commented Jan 25, 2016 at 14:23
  • A simple way is to read file2 into a list, then read file1 line by line and write every line where the email is not in the list into file3. Try coding it, if you get stuck, post your code and ask for help. Commented Jan 25, 2016 at 14:25
  • It has already been answered a ton of time. Show some code to work with or else you probably will get flaged as duplicate. Commented Jan 25, 2016 at 14:25

5 Answers 5

1
dont_need_em = []
with open("file2.csv", 'r') as fn:
    for line in fn:
        if not line.startswith("Email"):
            dont_need_em.append(line.rstrip())

fw = open("file3.csv", 'w')

with open("file1.csv", 'r') as fn:
    for line in fn:
        if line.rstrip().split(", ")[1] not in dont_need_em: 
            fw.write(line.rstrip())
fw.close()

This should do it, but i am sure there are way simpler solutions

EDIT: Create the third file

Sign up to request clarification or add additional context in comments.

1 Comment

See my answer below, I used essentially exactly the same method
1

Using Pandas you can do this:

import pandas as pd
#Read two files into data frame using column names from first row
file1=pd.read_csv('File1.csv',header=0,skipinitialspace=True)
file2=pd.read_csv('File2.csv',header=0,skipinitialspace=True)

#Only return lines in file 1 if the email is not contained in file 2
cleaned=file1[~file1["Email"].isin(file2["Email"])]

#Output file to CSV with original headers
cleaned.to_csv("File3.csv", index=False)

Comments

0

Here's a good way to do that (it's very similar to the above, but writes the remainder to file rather than printing:

Removed = []
with open("file2.csv", 'r') as f2:
    for line in f2:
        if not line.startswith("Email"):
           removed.append(line.rstrip())


with open("file1.csv", 'r') as f1:
    with open("file3.csv", 'w') as f3:
        for line in f1:
            if line.rstrip().split(", ")[1] not in removed:
                f3.write(line)

How this works: The first block reads all the emails you want to filter out into a list. Next, the second block opens your original file and sets up a new file to write what's left. It reads each line from your first file and writes them to the third file only if the email isn't in your list to filter

Comments

0

If you are under UNIX:

#! /usr/bin/env python
import subprocess
import sys

def filter(input_file, filter_file, out_file):
    subprocess.call("grep -f '%s' '%s' > '%s' " % (filter_file, input_file, out_file), shell=True)

Comments

0

The following should do what you are looking for. First read File2.csv into a set of email addresses to be skipped. Then read File1.csv row by row, writing only rows which are not in the skip list:

import csv

with open('File2.csv', 'r') as file2:
    skip_list = set(line.strip() for line in file2.readlines()[1:])

with open('File1.csv', 'rb') as file1, open('File3.csv', 'wb') as file3:
    csv_file1 = csv.reader(file1, skipinitialspace=True)
    csv_file3 = csv.writer(file3)
    csv_file3.writerow(next(csv_file1))    # Write the header line

    for cols in csv_file1:
        if cols[1] not in skip_list:
            csv_file3.writerow(cols)

This would give you the following output in File3.csv:

Name,Email
Jon,[email protected]
Roberto,[email protected]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.