1

I have a .csv file that when opened in Excel looks like this: enter image description here

My code:

myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")

    countries = []
    for item in myfile:
        a = item.split(",")
        countries.append(a)

    hdi_list = []
    for acountry in countries:
        hdi = acountry[3]

        try:
            hdi_list.append(float(hdi))
        except:
            pass

    average = round(sum(hdi_list)/len(hdi_list), 2)
    maxNumber = round(max(hdi_list), 2)
    minNumber = round(min(hdi_list), 2)

This code works well, however, when I find the max,min, or avg I need to grab the corresponding name of the country and print that as well.

How can I change my code to grab the country name of the min,max, avg as well?

11
  • Maybe use a dictionary instead of a list? Have the countries as keys and the values as values? Then find the key for the highest value? Commented Oct 24, 2014 at 16:24
  • It a text file, why not open in text mode ...'rt') Commented Oct 24, 2014 at 16:25
  • 1
    Are you sure this code is working? There are commas in the country names in that cases there should be an additional field and values are one field off. Commented Oct 24, 2014 at 16:25
  • What would the "country name of the avg" look like? Commented Oct 24, 2014 at 16:25
  • @ArtOfWarfare a dictionary with HDI as keys will prevent duplicate HDIs to be accounted in average Commented Oct 24, 2014 at 16:26

3 Answers 3

3

Instead of putting the values straight in the list, use tuples instead, like this:

hdi_list.append((float(hdi), acountry[1]))

Then you can use this instead:

maxTuple = max(hdi_list)
maxNumber = round(maxTuple[0], 2)
maxCountry = maxTuple[1]
Sign up to request clarification or add additional context in comments.

5 Comments

What if two countries have the same max HDI?
@RafaelBarros if you have a specification regarding two countries with the max HDI, add it to your question. Did you try it to see what happens? max([(1, 'a'),(1 , 'b'), (0, 'c')])
@wwii not my question, just thinking ahead
@RafaelBarros - Sorting a list of tuples will sort by the first field first, and then by the second field. So if multiple countries have the same maximum HDI, the one that comes last* alphabetically will be returned. The question didn't specify what should happen if multiple countries tied for the maximum HDI, so I went with this because it looked like the easiest way of doing what the asker said they wanted to have happen. *Because we're using max(), not min(), we get the last one, not the first one, when sorted alphabetically.
I think you forgot parentheses around the argument to append.
2

Using the pandas module, [4], [5], and [6] below should show the max, min, and average respectively. Note that the data below doesn't match yours save for country.

In [1]: import pandas as pd

In [2]: df = pd.read_csv("hdi.csv")

In [3]: df
Out[3]: 
         Country    HDI
0         Norway  83.27
1      Australia  80.77
2    Netherlands  87.00
3  United States  87.43
4    New Zealand  87.43
5         Canada  87.66
6        Ireland  75.47
7  Liechtenstein  88.97
8        Germany  86.31
9         Sweden  80.54

In [4]: df.ix[df["HDI"].idxmax()]
Out[4]: 
Country    Liechtenstein
HDI                88.97
Name: 7, dtype: object

In [5]: df.ix[df["HDI"].idxmin()]
Out[5]: 
Country    Ireland
HDI          75.47
Name: 6, dtype: object

In [6]: df["HDI"].mean()
Out[6]: 84.484999999999985

Assuming both Liechtenstein and Germany have max values:

In [15]: df
Out[15]: 
         Country    HDI
0         Norway  83.27
1      Australia  80.77
2    Netherlands  87.00
3  United States  87.43
4    New Zealand  87.43
5         Canada  87.66
6        Ireland  75.47
7  Liechtenstein  88.97
8        Germany  88.97
9         Sweden  80.54

In [16]: df[df["HDI"] == df["HDI"].max()]
Out[16]: 
         Country    HDI
7  Liechtenstein  88.97
8        Germany  88.97

The same logic can be applied for the minimum value.

2 Comments

Between the fact you used pandas and iPy, I can't even follow this. IE, your last line of input, df[df["HDI"] == df["HDI"].max()]... is that even valid Python? The equality check in the middle of a subscript looks weird to me... unless this is some kind of advanced slice notation I've never seen before? Or is it something that's only possible because of iPy and/or pandas?
It's a valid pandas notation. Basically reads as get dataframe view of df where column HDI of df is equal to the maximum value in column HDI of df. What can I say, pandas eats CSV processing for breakfast.
1

The following approach is close enough to your implementation that I think it might be useful. However, if you start working with larger or more complicated csv files, you should look into packages like "csv.reader" or "Pandas" (as previously mentioned). They are more robust and efficient in working with complex .csv data. You could also work through Excel with the "xlrd" package.

In my opinion, the simplest solution to reference country names with their respective values is to combine your 'for loops'. Instead of looping through your data twice (in two separate 'for loops') and creating two separate lists, use a single 'for loop' and create a dictionary with relevant data (ie. "country name", "hdi"). You could also create a tuple (as previously mentioned) but I think dictionaries are more explicit.

myfile = open("/Users/it/Desktop/Python/In-Class Programs/countries.csv", "rb")

countries = []
for line in myfile:
    country_name = line.split(",")[1]
    value_of_interest = float(line.split(",")[3])
    countries.append(
        {"Country Name": country_name, 
         "Value of Interest": value_of_interest})

ave_value = sum([country["Value of Interest"] for country in countries])/len(countries)
max_value = max([country["Value of Interest"] for country in countries])
min_value = min([country["Value of Interest"] for country in countries])

print "Country Average == ", ave_value
for country in countries:
    if country["Value of Interest"] == max_value:
        print "Max == {country}:{value}".format(country["Country Name"], country["Value of Interest"])
    if country["Value of Interest"] == min_value:
        print "Min == {country}:{value}".format(country["Country Name"], country["Value of Interest"])

Note that this method returns multiple countries if they have equal min/max values.

If you are dead-set on creating separate lists (like your current implementation), you might consider zip() to connect your lists (by index), where

zip(countries, hdi_list) = [(countries[1], hdi_list[1]), ...]

For example:

for country in zip(countries, hdi_list):
    if country[1] == max_value:
        print country[0], country[1]

with similar logic applied to the min and average. This method works but is less explicit and more difficult to maintain.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.