How to find specific rows in csv document in python

Question

What I'm trying to do is read into a csv document and find all values in the SN column > 20 and make a new file with only the rows with SN > 20.

I know that I need to do:

Read the original File
Open a new file
Iterate over rows of the original file

What I've been able to do is find the rows that have a value of SN > 20

import csv
import os

os.chdir("C:\Users\Robert\Documents\qwe")

with open("gdweights_feh_robert_cmr.csv",'rb') as f:
    reader = csv.reader(f, delimiter= ',')
    zerovar = 0
    for row in reader:
        if zerovar==0:
            zerovar = zerovar + 1
        else:
            sn = row [11]
            zerovar = zerovar + 1
            x = float(sn)
            if x > 20:
                print x

So my question is how do I take the rows with SN > 20 and turn it into a new file?

Skip the header with next(reader) before the loop, to remove the if-then statement from the body. — chepner
– chepner, Commented Apr 9, 2013 at 1:37

twasbrillig · Accepted Answer · 2013-06-01 01:00:03Z

3

Save the data in a list, then write the list to a file.

import csv
import os

os.chdir(r"C:\Users\Robert\Documents\qwe")

output_ary = []
with open("gdweights_feh_robert_cmr.csv",'rb') as f:
    reader = csv.reader(f, delimiter= ',')
    zerovar = 0
    for row in reader:
        if zerovar==0:
            zerovar = zerovar + 1
        else:
            sn = row [11]
            zerovar = zerovar + 1
            x = float(sn)
            if x > 20:
                print x
                output_ary.append(row)

with open("output.csv",'w') as f2:
    for row in output_ary:
        for item in row:
            f2.write(item + ",")

edited Jun 1, 2013 at 1:00

answered Apr 9, 2013 at 1:09

twasbrillig

19.2k9 gold badges47 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Jared Over a year ago

It sounds like it should be output_ary.append(row)

chepner Over a year ago

This uses quite a bit more memory than is necessary; just print the appropriate lines to the output file as they are read, instead of storing them all in a list.

Robert Khachatryan Over a year ago

When I try doing this, I get a TypeError: expected a character buffer object. What does this mean?

twasbrillig Over a year ago

my mistake, I fixed it, please try again.

Robert Khachatryan Over a year ago

Sorry for late response...It works, however, I get a problem where it doesn't load everything. And only loads the first row.

Nick Burns · Accepted Answer · 2013-04-09 01:17:13Z

0

In the code, the reading / looping through the rows is is quite complex. It could be cleaned up (and run faster in Python) with the following:

with open('gdweights_feh_robert_cmr.csv', 'rb') as f:
    output_ary = [row for row in f if float(row[11]) > 20]

Using list comprehension ([row for row if f]) is optimised in python, so it will preform more efficiently. AND... you avoid having to create the reader array, which will reduce the memory required, also very handy if the csv file is large.

You can then proceed to write out the outout_ary as suggested in the other answers.

Hope this helps!

answered Apr 9, 2013 at 1:17

Nick Burns

9836 silver badges4 bronze badges

2 Comments

Robert Khachatryan Over a year ago

The problem with this is that you cannot convert a string to a float.

Nick Burns Over a year ago

That's interesting, it is perfectly acceptable to cast a string as a float, e.g. s = '3', y = float(s). You might be observe conflict if the string contains non-numerical characters e.g. float('3a') will result in a ValueError. If the string contains quote symbols, this will also occur

Collectives™ on Stack Overflow

How to find specific rows in csv document in python

2 Answers 2

5 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related