1

I am trying to use Python's built-in filter function in order to extract data from certain columns in a CSV. Is this a good use of the filter function? Would I have to define the data in these columns first, or would Python somehow already know which columns contain what data?

5
  • Can you provide an example of your input data and the requested output data? Commented Nov 28, 2011 at 5:04
  • Can you explain in more detail what you're trying to do? Maybe show an example? It's not clear to me... Commented Nov 28, 2011 at 5:05
  • Sure thing. Let's say my CSV has columns 1,2, and 3. I want to ignore all the data in column 2, and extract only what's in columns 1 and 3. Can this be achieved using the filter function? Commented Nov 28, 2011 at 5:13
  • That's not much of an explanation, but I guess it's something to go on... Commented Nov 28, 2011 at 9:49
  • 1
    You should read CSV files with the stdlib's csv module, as in "number5"'s answer bellow. The filter built in is better left for other uses Commented Nov 28, 2011 at 12:17

2 Answers 2

7

Since python boasted "batteries included", for most the everyday situations, someone might already provided a solution. CSV is one of them, there is built-in csv module

Also tablib is a very good 3rd-party module especially you're dealing with non-ascii data.

For the behaviour you described in the comment, this will do:

import csv
with open('some.csv', 'rb') as f:
   reader = csv.reader(f)
   for row in reader:
      row.pop(1)
      print ", ".join(row)
Sign up to request clarification or add additional context in comments.

Comments

2

The filter function is intended to select from a list (or in general, any iterable) those elements which satisfy a certain condition. It's not really intended for index-based selection. So although you could use it to pick out specified columns of a CSV file, I wouldn't recommend it. Instead you should probably use something like this:

with open(filename, 'rb') as f:
    for record in csv.reader(f):
        do_something_with(record[0], record[2])

Depending on what exactly you are doing with the records, it may be better to create an iterator over the columns of interest:

with open(filename, 'rb') as f:
    the_iterator = ((record[0], record[2]) for record in csv.reader(f))
    # do something with the iterator

or, if you need non-sequential processing, perhaps a list:

with open(filename, 'rb') as f:
    the_list = [(record[0], record[2]) for record in csv.reader(f)]
    # do something with the list

I'm not sure what you mean by defining the data in the columns. The data are defined by the CSV file.


By comparison, here's a case in which you would want to use filter: suppose your CSV file contains numeric data, and you need to build a list of the records in which the numbers are in strictly increasing order within the row. You could write a function to determine whether a list of numbers is in strictly increasing order:

def strictly_increasing(fields):
    return all(int(i) < int(j) for i,j in pairwise(fields))

(see the itertools documentation for a definition of pairwise). Then you can use this as the condition in filter:

with open(filename, 'rb') as f:
    the_list = filter(strictly_increasing, csv.reader(f))
    # do something with the list

Of course, the same thing could, and usually would, be implemented as a list comprehension:

with open(filename, 'rb') as f:
    the_list = [record for record in csv.reader(f) if strictly_increasing(record)]
    # do something with the list

so there's little reason to use filter in practice.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.