Reading CSV files and rewriting them without certain rows Python

Question

I am new to programming. I have hundreds of CSV files in a folder and certain files have the letters DIF in the second column. I want to rewrite the CSV files without those lines in them. I have attempted doing that for one file and have put my attempt below. I need also need help getting the program to do that dfor all the files in my directory. Any help would be appreciated.

Thank you

import csv

reader=csv.reader(open("40_5.csv","r"))


for row in reader:
if row[1] == 'DIF':
    csv.writer(open('40_5N.csv', 'w')).writerow(row)

hanslovsky · Accepted Answer · 2013-08-09 00:02:24Z

1

I made some changes to your code:

import csv
import glob
import os

fns = glob.glob('*.csv')

for fn in fns:
    reader=csv.reader(open(fn,"rb"))


    with open (os.path.join('out', fn), 'wb') as f:
        w = csv.writer(f)
        for row in reader:
            if not 'DIF' in row:
                w.writerow(row)

The glob command produces a list of all files ending with .csv in the current directory. If you want to give the source directory as an argument to your program, have a look into sys.argv or argparse (especially the latter is very powerful for command line parsing).

You also have to be careful when opening a file in 'w' mode: It means truncating the file, i.e. in your loop you would always overwrite the existing file, ending up in only one csv line.

The direcotry 'out' must exist or the script will produce an IOError.

Links: open sys.argv argparse glob

edited Aug 9, 2013 at 0:02

answered Aug 8, 2013 at 23:38

hanslovsky

8806 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Peter DeGlopper Over a year ago

This answer is close but has one major problem and a couple of minor ones. Don't create a new writer for each input row! Less importantly, it's a better practice to use os.path.join to create the output file path rather than a hardcoded '/' divider. And as the docs note, both the input and output files should be opened in binary mode ('rb' and 'wb' respectively). docs.python.org/2/library/csv.html#csv.reader

hanslovsky Over a year ago

I'll add your suggestions.

Peter DeGlopper Over a year ago

And for the OP - if you really only want to check for 'DIF' in the second column, and still write rows that have it in other columns, replace the if not 'DIF' in row: check with if row[1] != 'DIF':

Hamza Surti Over a year ago

Thank you so much. With a few minor edits, this worked perfectly.

dreynold · Accepted Answer · 2013-08-08 23:34:00Z

0

Most sequence types support the in or not in operators, which are much simpler to use to test for values than figuring index positions.

for row in reader:
    if not 'DIF' in row:
        csv.writer(open('40_5N.csv', 'w')).writerow(row)

edited Aug 8, 2013 at 23:34

answered Aug 8, 2013 at 23:28

dreynold

8858 silver badges17 bronze badges

Comments

user2569332 · Accepted Answer · 2013-08-09 00:12:31Z

0

If you're willing to install numpy, you can also read a csv file into the convenient numpy array format with either recfromcsv or the more general genfromtxt (genfromtxt requires you specify the comma delimiter), and you can specify which rows and columns to ignore. Documentation can be found here for genfromtxt:

http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

And here for recfromcsv: http://nullege.com/codes/search/numpy.recfromcsv?fulldoc=1

answered Aug 9, 2013 at 0:12

user2569332

5651 gold badge4 silver badges12 bronze badges

Collectives™ on Stack Overflow

Reading CSV files and rewriting them without certain rows Python

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related