4

I have this input in a file.csv

"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54

I wanted to write a simple program to find the city with the lowest rainfall which is Missouri in this case. How can I do that using Python csv reader?

I can try extract the items but unfortunately the first row of the file has to be there. I wanted to have something like count[Missouri]=300 count[Amsterdam]=1212 etc.. so that I can do a minimum and reference back to print the city.

Please advise. Thanks.

1
  • 1
    And what is the particular problem? Why shouldn't that be possible using the Python csv module? What do you have so far? Don't expect that we repeat the same CSV examples that you can find in the Python Library reference of the csv module...what do you need exactly? Commented Apr 11, 2011 at 12:29

5 Answers 5

6
import csv

def main():
    with open('file.csv', 'rb') as inf:
        data = [(int(row['rainfall']), row['']) for row in csv.DictReader(inf)]

    data.sort()
    print data[0]

if __name__=="__main__":
    main()

returns

(300, 'Missouri')
Sign up to request clarification or add additional context in comments.

1 Comment

print list(groupby(data, key=itemgetter(0))[0]) for cases in which there are several cities having the same rainfall minimum
1

One way to do this would be to use the csv module's DictReader class to write a function to extract the column of data. DictReader will take care of handling the first row of field names automatically. The built-in min() function can then be used to determine the item with the smallest value in the column.

import csv

def csv_extract_col(csvinput, colname, key):
    """ extract a named column from a csv stream into a dictionary
          colname:  name of columm to extract
          key:  name of another columm to use as keys in returned dict
    """
    col = {}
    for row in csv.DictReader(csvinput):
        col[row[key]] = row[colname]
    return col

if __name__=='__main__':
    import StringIO

    csvdata = """\
"","min","max","rainfall","days_clear"  # field name row
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54
"""
    csvfile = StringIO.StringIO(csvdata)

    rainfall = csv_extract_col(csvfile, 'rainfall', '')
    print rainfall
    # {'Amsterdam': '1212', 'LA': '1000', 'Missouri': '300'}

    print min(rainfall.iteritems(), key=lambda r: float(r[1]))
    # ('Missouri', '300')

2 Comments

Thanks for all the different answers and suggestions. The latter one from Martineau seems suited for me. However, I am thinking how can I use the file.csv instead of assigning the data in csvdata separately.
@joj: I only made csvdata a StringIO to test my sample code. Just use csvfile = open('file.csv') instead.
0
import StringIO
import csv

example = """"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"LA",10,20,1000,54
"""

data_in = StringIO.StringIO(example)
#data_in = open('mycsvdata.csv')

def read_data(data_in):
  reader = csv.reader(data_in)
  cols = []
  results = {}
  for row in reader:
    if not cols:
      cols = row
      continue
    row = [ int(x) if x.lstrip('-').isdigit() else x for x in row ]
    results[row[0]] = dict(zip(cols[1:],row[1:]))
  return results

data = read_data(data_in)

min(data.items(),key=lambda x: x[1].get('rainfall'))

Returns

('Missouri', {'max': 10, 'days_clear': 23, 'rainfall': 300, 'min': -2})

1 Comment

if not cols: cols = row - a nice workaround for the cleaner csv.DictReader usage.
0

To read from a file, you need to remove all code that deals with a string:

   reader = csv.reader(open('file.csv', 'rb'))
   rainfall = csv_extract_col(reader, 'rainfall', '')

Update: Sorry, it neads a bit more work than that. The first arg of csv_extract_col will be used as the first arg of csv.DictReader so (in this case) it should be an open file object, and should never be a csv.reader instance. See below:

import csv

### def csv_extract_col(csvinput, colname, key):
### exactly as provided by @martineau

if __name__ == '__main__':
    import sys
    filename, data_col_name, key_col_name = sys.argv[1:4]
    input_file_object = open(filename, 'rb')
    result_dict = csv_extract_col(input_file_object, data_col_name, key_col_name)
    print result_dict
    print min(result_dict.iteritems(), key=lambda r: float(r[1]))

Results:

command-prompt>\python27\python joj_csv.py joj.csv rainfall ""
{'Amsterdam': '1212', 'LA': '1000', 'Missouri': '300'}
('Missouri', '300')

command-prompt>\python27\python joj_csv.py joj.csv days_clear ""
{'Amsterdam': '34', 'LA': '54', 'Missouri': '23'}
('Missouri', '23')

Update 2 in response to comment """there must be something i missed out.. i tried.. [what looks like @martineau's function] with the above main function you define. Then in my shell, i define python rainfall "". But it gives me KeyError: 'rainfall'"""

Two possibilities:

(1) You made a mistake patching the pieces of source code together. Check your work.

(2) Your file doesn't have the expected heading row contents. Try some debugging e.g. change @martineau's code so that you can insert a print statement etc. to show what the csv.DictReader thinks about your heading row:

reader = csv.DictReader(csvinput)
print "fieldnames", reader.fieldnames
assert colname in reader.fieldnames
assert key in reader.fieldnames
for row in reader:

If you are still stuck, show us ALL of your code plus the full traceback and error message -- either edit your question or put it up on pastbin or dropbox; DON'T put it into a comment!!

3 Comments

Yeah @John,I tried remove and replace those string (similarly to the one you suggested above). It gives me error:-Traceback (most recent call last): File "<stdin>", line 1, in <module> File "gy.py", line 28, in <module> rainfall = csv_extract_col(reader, 'rainfall', '') File "gy.py", line 10, in csv_extract_col for row in csv.DictReader(csvinput): File "/usr/lib/python2.6/csv.py", line 103, in next self.fieldnames File "/usr/lib/python2.6/csv.py", line 90, in fieldnames self._fieldnames = self.reader.next() TypeError: expected string or Unicode object, list found
I tried using your suggestion by removing the string literal but it still cant work for me. Same error as above.
there must be something i missed out.. i tried..import csv def csv_extract_col(csvinput, colname, key): """ extract a named column from a csv stream into a dictionary colname: name of columm to extract key: name of another columm to use as keys in returned dict """ col = {} for row in csv.DictReader(csvinput): col[row[key]] = row[colname] return col with the a bove main function you define. Then in my shell, i define python <my pythonfile> <my csv> rainfall "". But it gives me KeyError: 'rainfall'
0

My code for cases in which there are several cities having the same minimum or several cities having the same maximum:

import csv

def minmax_col(filename,key,colname):
    with open(filename,'rb') as csvfile:
        rid = csv.DictReader(csvfile,
                             fieldnames=None,
                             quoting=csv.QUOTE_NONNUMERIC)

        mini = float('inf')
        maxi = float('-inf')
        limin = limax =[]

        for row in rid:
            if row[colname] == maxi:
                limax.append(row[key])
            elif row[colname] > maxi:
                maxi = row[colname]
                limax = [row[key]]

            if row[colname] == mini:
                limin.append(row[key])
            elif row[colname] < mini:
                mini = row[colname]
                limin = [row[key]]

    return (key,(maxi,limax),(mini,limin))



key = 'rainfall'
city,(Ma,liMa),(mi,limi) = minmax_col('filename.csv','',key)
print 'Cities analysed on ' + repr(key) + ' parameter :'
print 'maximum==',Ma,'  cities :',', '.join(liMa)
print 'minimum==',mi,'  cities :',', '.join(limi)

print 

key = 'min'
city,(Ma,liMa),(mi,limi) = minmax_col('filename.csv','',key)
print 'Cities analysed on ' + repr(key) + ' parameter :'
print 'maximum==',Ma,'  cities :',', '.join(liMa)
print 'minimum==',mi,'  cities :',', '.join(limi)

On a file like that:

"","min","max","rainfall","days_clear"
"Missouri",-2,10,300,23
"Amsterdam",-3,5,1212,34
"Oslo",-2,8,800,12
"LA",10,20,1000,54
"Kologoro",28,45,1212,1

the result is

Cities analysed according the 'rainfall' parameter :
maximum== 1212.0   cities : Amsterdam, Kologoro
minimum== 300.0   cities : Missouri

Cities analysed according the 'min' parameter :
maximum== 28.0   cities : Kologoro
minimum== -3.0   cities : Amsterdam

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.