I am trying to use Python's built-in filter function in order to extract data from certain columns in a CSV. Is this a good use of the filter function? Would I have to define the data in these columns first, or would Python somehow already know which columns contain what data?
2 Answers
Since python boasted "batteries included", for most the everyday situations, someone might already provided a solution. CSV is one of them, there is built-in csv module
Also tablib is a very good 3rd-party module especially you're dealing with non-ascii data.
For the behaviour you described in the comment, this will do:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
row.pop(1)
print ", ".join(row)
Comments
The filter function is intended to select from a list (or in general, any iterable) those elements which satisfy a certain condition. It's not really intended for index-based selection. So although you could use it to pick out specified columns of a CSV file, I wouldn't recommend it. Instead you should probably use something like this:
with open(filename, 'rb') as f:
for record in csv.reader(f):
do_something_with(record[0], record[2])
Depending on what exactly you are doing with the records, it may be better to create an iterator over the columns of interest:
with open(filename, 'rb') as f:
the_iterator = ((record[0], record[2]) for record in csv.reader(f))
# do something with the iterator
or, if you need non-sequential processing, perhaps a list:
with open(filename, 'rb') as f:
the_list = [(record[0], record[2]) for record in csv.reader(f)]
# do something with the list
I'm not sure what you mean by defining the data in the columns. The data are defined by the CSV file.
By comparison, here's a case in which you would want to use filter: suppose your CSV file contains numeric data, and you need to build a list of the records in which the numbers are in strictly increasing order within the row. You could write a function to determine whether a list of numbers is in strictly increasing order:
def strictly_increasing(fields):
return all(int(i) < int(j) for i,j in pairwise(fields))
(see the itertools documentation for a definition of pairwise). Then you can use this as the condition in filter:
with open(filename, 'rb') as f:
the_list = filter(strictly_increasing, csv.reader(f))
# do something with the list
Of course, the same thing could, and usually would, be implemented as a list comprehension:
with open(filename, 'rb') as f:
the_list = [record for record in csv.reader(f) if strictly_increasing(record)]
# do something with the list
so there's little reason to use filter in practice.
filterbuilt in is better left for other uses