Find max number in .CSV file in Python

Question

I have a .csv file that when opened in Excel looks like this: enter image description here

My code:

myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")

    countries = []
    for item in myfile:
        a = item.split(",")
        countries.append(a)

    hdi_list = []
    for acountry in countries:
        hdi = acountry[3]

        try:
            hdi_list.append(float(hdi))
        except:
            pass

    average = round(sum(hdi_list)/len(hdi_list), 2)
    maxNumber = round(max(hdi_list), 2)
    minNumber = round(min(hdi_list), 2)

This code works well, however, when I find the max,min, or avg I need to grab the corresponding name of the country and print that as well.

How can I change my code to grab the country name of the min,max, avg as well?

Maybe use a dictionary instead of a list? Have the countries as keys and the values as values? Then find the key for the highest value? — ArtOfWarfare
– ArtOfWarfare, Commented Oct 24, 2014 at 16:24
Are you sure this code is working? There are commas in the country names in that cases there should be an additional field and values are one field off. — Klaus D.
– Klaus D., Commented Oct 24, 2014 at 16:25
@ArtOfWarfare a dictionary with HDI as keys will prevent duplicate HDIs to be accounted in average — Rafael Barros
– Rafael Barros, Commented Oct 24, 2014 at 16:26

ArtOfWarfare · Accepted Answer · 2014-10-24 19:20:57Z

3

Instead of putting the values straight in the list, use tuples instead, like this:

hdi_list.append((float(hdi), acountry[1]))

Then you can use this instead:

maxTuple = max(hdi_list)
maxNumber = round(maxTuple[0], 2)
maxCountry = maxTuple[1]

edited Oct 24, 2014 at 19:20

answered Oct 24, 2014 at 16:27

ArtOfWarfare

21.7k19 gold badges150 silver badges203 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Rafael Barros Over a year ago

What if two countries have the same max HDI?

wwii Over a year ago

@RafaelBarros if you have a specification regarding two countries with the max HDI, add it to your question. Did you try it to see what happens? max([(1, 'a'),(1 , 'b'), (0, 'c')])

Rafael Barros Over a year ago

@wwii not my question, just thinking ahead

ArtOfWarfare Over a year ago

@RafaelBarros - Sorting a list of tuples will sort by the first field first, and then by the second field. So if multiple countries have the same maximum HDI, the one that comes last* alphabetically will be returned. The question didn't specify what should happen if multiple countries tied for the maximum HDI, so I went with this because it looked like the easiest way of doing what the asker said they wanted to have happen. *Because we're using max(), not min(), we get the last one, not the first one, when sorted alphabetically.

DSM Over a year ago

I think you forgot parentheses around the argument to append.

WGS · Accepted Answer · 2014-10-24 16:41:36Z

2

Using the pandas module, [4], [5], and [6] below should show the max, min, and average respectively. Note that the data below doesn't match yours save for country.

In [1]: import pandas as pd

In [2]: df = pd.read_csv("hdi.csv")

In [3]: df
Out[3]: 
         Country    HDI
0         Norway  83.27
1      Australia  80.77
2    Netherlands  87.00
3  United States  87.43
4    New Zealand  87.43
5         Canada  87.66
6        Ireland  75.47
7  Liechtenstein  88.97
8        Germany  86.31
9         Sweden  80.54

In [4]: df.ix[df["HDI"].idxmax()]
Out[4]: 
Country    Liechtenstein
HDI                88.97
Name: 7, dtype: object

In [5]: df.ix[df["HDI"].idxmin()]
Out[5]: 
Country    Ireland
HDI          75.47
Name: 6, dtype: object

In [6]: df["HDI"].mean()
Out[6]: 84.484999999999985

Assuming both Liechtenstein and Germany have max values:

In [15]: df
Out[15]: 
         Country    HDI
0         Norway  83.27
1      Australia  80.77
2    Netherlands  87.00
3  United States  87.43
4    New Zealand  87.43
5         Canada  87.66
6        Ireland  75.47
7  Liechtenstein  88.97
8        Germany  88.97
9         Sweden  80.54

In [16]: df[df["HDI"] == df["HDI"].max()]
Out[16]: 
         Country    HDI
7  Liechtenstein  88.97
8        Germany  88.97

The same logic can be applied for the minimum value.

answered Oct 24, 2014 at 16:41

WGS

14.2k5 gold badges50 silver badges51 bronze badges

2 Comments

ArtOfWarfare Over a year ago

Between the fact you used pandas and iPy, I can't even follow this. IE, your last line of input, df[df["HDI"] == df["HDI"].max()]... is that even valid Python? The equality check in the middle of a subscript looks weird to me... unless this is some kind of advanced slice notation I've never seen before? Or is it something that's only possible because of iPy and/or pandas?

WGS Over a year ago

It's a valid pandas notation. Basically reads as get dataframe view of df where column HDI of df is equal to the maximum value in column HDI of df. What can I say, pandas eats CSV processing for breakfast.

OnStrike · Accepted Answer · 2014-10-24 22:46:00Z

The following approach is close enough to your implementation that I think it might be useful. However, if you start working with larger or more complicated csv files, you should look into packages like "csv.reader" or "Pandas" (as previously mentioned). They are more robust and efficient in working with complex .csv data. You could also work through Excel with the "xlrd" package.

In my opinion, the simplest solution to reference country names with their respective values is to combine your 'for loops'. Instead of looping through your data twice (in two separate 'for loops') and creating two separate lists, use a single 'for loop' and create a dictionary with relevant data (ie. "country name", "hdi"). You could also create a tuple (as previously mentioned) but I think dictionaries are more explicit.

myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")

countries = []
for line in myfile:
    country_name = line.split(",")[1]
    value_of_interest = float(line.split(",")[3])
    countries.append(
        {"Country Name": country_name, 
         "Value of Interest": value_of_interest})

ave_value = sum([country["Value of Interest"] for country in countries])/len(countries)
max_value = max([country["Value of Interest"] for country in countries])
min_value = min([country["Value of Interest"] for country in countries])

print "Country Average == ", ave_value
for country in countries:
    if country["Value of Interest"] == max_value:
        print "Max == {country}:{value}".format(country["Country Name"], country["Value of Interest"])
    if country["Value of Interest"] == min_value:
        print "Min == {country}:{value}".format(country["Country Name"], country["Value of Interest"])

Note that this method returns multiple countries if they have equal min/max values.

If you are dead-set on creating separate lists (like your current implementation), you might consider zip() to connect your lists (by index), where

zip(countries, hdi_list) = [(countries[1], hdi_list[1]), ...]

For example:

for country in zip(countries, hdi_list):
    if country[1] == max_value:
        print country[0], country[1]

with similar logic applied to the min and average. This method works but is less explicit and more difficult to maintain.

Collectives™ on Stack Overflow

Find max number in .CSV file in Python

3 Answers 3

5 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related