1

I am new to programming. I have hundreds of CSV files in a folder and certain files have the letters DIF in the second column. I want to rewrite the CSV files without those lines in them. I have attempted doing that for one file and have put my attempt below. I need also need help getting the program to do that dfor all the files in my directory. Any help would be appreciated.

Thank you

import csv

reader=csv.reader(open("40_5.csv","r"))


for row in reader:
if row[1] == 'DIF':
    csv.writer(open('40_5N.csv', 'w')).writerow(row)

3 Answers 3

1

I made some changes to your code:

import csv
import glob
import os

fns = glob.glob('*.csv')

for fn in fns:
    reader=csv.reader(open(fn,"rb"))


    with open (os.path.join('out', fn), 'wb') as f:
        w = csv.writer(f)
        for row in reader:
            if not 'DIF' in row:
                w.writerow(row)

The glob command produces a list of all files ending with .csv in the current directory. If you want to give the source directory as an argument to your program, have a look into sys.argv or argparse (especially the latter is very powerful for command line parsing).

You also have to be careful when opening a file in 'w' mode: It means truncating the file, i.e. in your loop you would always overwrite the existing file, ending up in only one csv line.

The direcotry 'out' must exist or the script will produce an IOError.

Links: open sys.argv argparse glob

Sign up to request clarification or add additional context in comments.

4 Comments

This answer is close but has one major problem and a couple of minor ones. Don't create a new writer for each input row! Less importantly, it's a better practice to use os.path.join to create the output file path rather than a hardcoded '/' divider. And as the docs note, both the input and output files should be opened in binary mode ('rb' and 'wb' respectively). docs.python.org/2/library/csv.html#csv.reader
I'll add your suggestions.
And for the OP - if you really only want to check for 'DIF' in the second column, and still write rows that have it in other columns, replace the if not 'DIF' in row: check with if row[1] != 'DIF':
Thank you so much. With a few minor edits, this worked perfectly.
0

Most sequence types support the in or not in operators, which are much simpler to use to test for values than figuring index positions.

for row in reader:
    if not 'DIF' in row:
        csv.writer(open('40_5N.csv', 'w')).writerow(row)

Comments

0

If you're willing to install numpy, you can also read a csv file into the convenient numpy array format with either recfromcsv or the more general genfromtxt (genfromtxt requires you specify the comma delimiter), and you can specify which rows and columns to ignore. Documentation can be found here for genfromtxt:

http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

And here for recfromcsv: http://nullege.com/codes/search/numpy.recfromcsv?fulldoc=1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.