How to extract column and row in csv using python

Question

I have this input in a file.csv

"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54

I wanted to write a simple program to find the city with the lowest rainfall which is Missouri in this case. How can I do that using Python csv reader?

I can try extract the items but unfortunately the first row of the file has to be there. I wanted to have something like count[Missouri]=300 count[Amsterdam]=1212 etc.. so that I can do a minimum and reference back to print the city.

Please advise. Thanks.

And what is the particular problem? Why shouldn't that be possible using the Python csv module? What do you have so far? Don't expect that we repeat the same CSV examples that you can find in the Python Library reference of the csv module...what do you need exactly? — user2665694
– user2665694, Commented Apr 11, 2011 at 12:29

Hugh Bothwell · Accepted Answer · 2011-04-11 12:48:31Z

6

import csv

def main():
    with open('file.csv', 'rb') as inf:
        data = [(int(row['rainfall']), row['']) for row in csv.DictReader(inf)]

    data.sort()
    print data[0]

if __name__=="__main__":
    main()

returns

(300, 'Missouri')

answered Apr 11, 2011 at 12:48

Hugh Bothwell

57k9 gold badges91 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

eyquem Over a year ago

print list(groupby(data, key=itemgetter(0))[0]) for cases in which there are several cities having the same rainfall minimum

martineau · Accepted Answer · 2011-04-11 15:34:31Z

1

One way to do this would be to use the csv module's DictReader class to write a function to extract the column of data. DictReader will take care of handling the first row of field names automatically. The built-in min() function can then be used to determine the item with the smallest value in the column.

import csv

def csv_extract_col(csvinput, colname, key):
    """ extract a named column from a csv stream into a dictionary
          colname:  name of columm to extract
          key:  name of another columm to use as keys in returned dict
    """
    col = {}
    for row in csv.DictReader(csvinput):
        col[row[key]] = row[colname]
    return col

if __name__=='__main__':
    import StringIO

    csvdata = """\
"","min","max","rainfall","days_clear"  # field name row
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54
"""
    csvfile = StringIO.StringIO(csvdata)

    rainfall = csv_extract_col(csvfile, 'rainfall', '')
    print rainfall
    # {'Amsterdam': '1212', 'LA': '1000', 'Missouri': '300'}

    print min(rainfall.iteritems(), key=lambda r: float(r[1]))
    # ('Missouri', '300')

answered Apr 11, 2011 at 15:34

martineau

124k29 gold badges180 silver badges319 bronze badges

2 Comments

joj Over a year ago

Thanks for all the different answers and suggestions. The latter one from Martineau seems suited for me. However, I am thinking how can I use the file.csv instead of assigning the data in csvdata separately.

martineau Over a year ago

@joj: I only made csvdata a StringIO to test my sample code. Just use csvfile = open('file.csv') instead.

MattH · Accepted Answer · 2011-04-11 12:59:13Z

0

import StringIO
import csv

example = """"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54
"""

data_in = StringIO.StringIO(example)
#data_in = open('mycsvdata.csv')

def read_data(data_in):
  reader = csv.reader(data_in)
  cols = []
  results = {}
  for row in reader:
    if not cols:
      cols = row
      continue
    row = [ int(x) if x.lstrip('-').isdigit() else x for x in row ]
    results[row[0]] = dict(zip(cols[1:],row[1:]))
  return results

data = read_data(data_in)

min(data.items(),key=lambda x: x[1].get('rainfall'))

Returns

('Missouri', {'max': 10, 'days_clear': 23, 'rainfall': 300, 'min': -2})

answered Apr 11, 2011 at 12:59

MattH

38.4k11 gold badges85 silver badges84 bronze badges

1 Comment

eumiro Over a year ago

if not cols: cols = row - a nice workaround for the cleaner csv.DictReader usage.

John Machin · Accepted Answer · 2011-04-14 11:39:54Z

0

To read from a file, you need to remove all code that deals with a string:

   reader = csv.reader(open('file.csv', 'rb'))
   rainfall = csv_extract_col(reader, 'rainfall', '')

Update: Sorry, it neads a bit more work than that. The first arg of csv_extract_col will be used as the first arg of csv.DictReader so (in this case) it should be an open file object, and should never be a csv.reader instance. See below:

import csv

### def csv_extract_col(csvinput, colname, key):
### exactly as provided by @martineau

if __name__ == '__main__':
    import sys
    filename, data_col_name, key_col_name = sys.argv[1:4]
    input_file_object = open(filename, 'rb')
    result_dict = csv_extract_col(input_file_object, data_col_name, key_col_name)
    print result_dict
    print min(result_dict.iteritems(), key=lambda r: float(r[1]))

Results:

command-prompt>\python27\python joj_csv.py joj.csv rainfall ""
{'Amsterdam': '1212', 'LA': '1000', 'Missouri': '300'}
('Missouri', '300')

command-prompt>\python27\python joj_csv.py joj.csv days_clear ""
{'Amsterdam': '34', 'LA': '54', 'Missouri': '23'}
('Missouri', '23')

Update 2 in response to comment """there must be something i missed out.. i tried.. [what looks like @martineau's function] with the above main function you define. Then in my shell, i define python rainfall "". But it gives me KeyError: 'rainfall'"""

Two possibilities:

(1) You made a mistake patching the pieces of source code together. Check your work.

(2) Your file doesn't have the expected heading row contents. Try some debugging e.g. change @martineau's code so that you can insert a print statement etc. to show what the csv.DictReader thinks about your heading row:

reader = csv.DictReader(csvinput)
print "fieldnames", reader.fieldnames
assert colname in reader.fieldnames
assert key in reader.fieldnames
for row in reader:

If you are still stuck, show us ALL of your code plus the full traceback and error message -- either edit your question or put it up on pastbin or dropbox; DON'T put it into a comment!!

edited Apr 14, 2011 at 11:39

answered Apr 13, 2011 at 1:43

John Machin

83.2k12 gold badges147 silver badges193 bronze badges

3 Comments

joj Over a year ago

Yeah @John,I tried remove and replace those string (similarly to the one you suggested above). It gives me error:-Traceback (most recent call last): File "<stdin>", line 1, in <module> File "gy.py", line 28, in <module> rainfall = csv_extract_col(reader, 'rainfall', '') File "gy.py", line 10, in csv_extract_col for row in csv.DictReader(csvinput): File "/usr/lib/python2.6/csv.py", line 103, in next self.fieldnames File "/usr/lib/python2.6/csv.py", line 90, in fieldnames self._fieldnames = self.reader.next() TypeError: expected string or Unicode object, list found

joj Over a year ago

I tried using your suggestion by removing the string literal but it still cant work for me. Same error as above.

joj Over a year ago

there must be something i missed out.. i tried..import csv def csv_extract_col(csvinput, colname, key): """ extract a named column from a csv stream into a dictionary colname: name of columm to extract key: name of another columm to use as keys in returned dict """ col = {} for row in csv.DictReader(csvinput): col[row[key]] = row[colname] return col with the a bove main function you define. Then in my shell, i define python <my pythonfile> <my csv> rainfall "". But it gives me KeyError: 'rainfall'

eyquem · Accepted Answer · 2011-04-14 17:37:56Z

My code for cases in which there are several cities having the same minimum or several cities having the same maximum:

import csv

def minmax_col(filename,key,colname):
    with open(filename,'rb') as csvfile:
        rid = csv.DictReader(csvfile,
                             fieldnames=None,
                             quoting=csv.QUOTE_NONNUMERIC)

        mini = float('inf')
        maxi = float('-inf')
        limin = limax =[]

        for row in rid:
            if row[colname] == maxi:
                limax.append(row[key])
            elif row[colname] > maxi:
                maxi = row[colname]
                limax = [row[key]]

            if row[colname] == mini:
                limin.append(row[key])
            elif row[colname] < mini:
                mini = row[colname]
                limin = [row[key]]

    return (key,(maxi,limax),(mini,limin))



key = 'rainfall'
city,(Ma,liMa),(mi,limi) = minmax_col('filename.csv','',key)
print 'Cities analysed on ' + repr(key) + ' parameter :'
print 'maximum==',Ma,'  cities :',', '.join(liMa)
print 'minimum==',mi,'  cities :',', '.join(limi)

print 

key = 'min'
city,(Ma,liMa),(mi,limi) = minmax_col('filename.csv','',key)
print 'Cities analysed on ' + repr(key) + ' parameter :'
print 'maximum==',Ma,'  cities :',', '.join(liMa)
print 'minimum==',mi,'  cities :',', '.join(limi)

On a file like that:

"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"Oslo",-2,8,800,12
"LA",10,20,1000,54
"Kologoro",28,45,1212,1

the result is

Cities analysed according the 'rainfall' parameter :
maximum== 1212.0   cities : Amsterdam, Kologoro
minimum== 300.0   cities : Missouri

Cities analysed according the 'min' parameter :
maximum== 28.0   cities : Kologoro
minimum== -3.0   cities : Amsterdam

Collectives™ on Stack Overflow

How to extract column and row in csv using python

5 Answers 5

1 Comment

2 Comments

1 Comment

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

2 Comments

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related