1

We basically have a large xcel file and what im trying to do is create a list that has the maximum and minimum values of each column. there are 13 columns which is why the while loop should stop once it hits 14. the problem is once the counter is increased it does not seem to iterate through the for loop once. Or more explicitly,the while loop only goes through the for loop once yet it does seem to loop in that it increases the counter by 1 and stops at 14. it should be noted that the rows in the input file are strings of numbers which is why I convert them to tuples and than check to see if the value in the given position is greater than the column_max or smaller than the column_min. if so I reassign either column_max or column_min.Once this is completed the column_max and column_min are appended to a list( l ) andthe counter,(position), is increased to repeat the next column. Any help will be appreciated.

input_file = open('names.csv','r')
l= []  
column_max = 0
column_min = 0
counter = 0
while counter<14:
    for row in input_file:
        row = row.strip()
        row = row.split(',')
        row = tuple(row)
        if (float(row[counter]))>column_max:
            column_max = float(row[counter])  
        elif (float(row[counter]))<column_min:
            column_min = float(row[counter])    
        else:
            column_min=column_min
            column_max = column_max
    l.append((column_max,column_min))
    counter = counter + 1
3
  • 2
    Use for i in range(14) instead of a while loop. Also, you might want to use csvreader instead of splitting by ,: csvreader will handle strings containing commas. Commented Oct 22, 2012 at 3:31
  • If there were thirteen columns, you would use 13 as your bounding value, not 14. Commented Oct 22, 2012 at 3:41
  • Instead of column_max = 0 and column_min = 0, I'd use column_max = float('-inf') and column_min = float('inf'). That way you know the maxima and minima will be correct. Commented Oct 22, 2012 at 3:42

2 Answers 2

3

I think you want to switch the order of your for and while loops.

Note that there is a slightly better way to do this:

with open('yourfile') as infile:
    #read first row.  Set column min and max to values in first row
    data = [float(x) for x in infile.readline().split(',')]
    column_maxs = data[:]
    column_mins = data[:]
    #read subsequent rows getting new min/max
    for line in infile:
        data = [float(x) for x in line.split(',')]
        for i,d in enumerate(data):
            column_maxs[i] = max(d,column_maxs[i])
            column_mins[i] = min(d,column_mins[i])

If you have enough memory to hold the file in memory at once, this becomes even easier:

with open('yourfile') as infile:
    data = [map(float,line.split(',')) for line in infile]
    data_transpose = zip(*data)
    col_mins = [min(x) for x in data_transpose]
    col_maxs = [max(x) for x in data_transpose]
Sign up to request clarification or add additional context in comments.

5 Comments

Merely swapping the two loops would not be enough; the row = lines would need to be taken out of the inner loop, also. But you are quite right: min and max are the correct way to do it.
This takes the maximum per row, but the code in the question is designed to calculate the maximum per column. Otherwise +1.
@ChrisMorgan -- Yes, the associated logic should also be moved. I assumed that was obvious enough ...
@BrianL -- Good catch. Updated accordingly.
"I think you want to switch the order of your for and while loops": no, he doesn't... that's a relic of the transposed interpretation of the question.
1

Once you have consumed the file, it has been consumed. Thus iterating over it again won't produce anything.

>>> for row in input_file:
...     print row
1,2,3,...
4,5,6,...
etc.
>>> for row in input_file:
...     print row
>>> # Nothing gets printed, the file is consumed

That is the reason why your code is not working.

You then have three main approaches:

  1. Read the file each time (inefficient in I/O operations);
  2. Load it into a list (inefficient for large files, as it stores the whole file in memory);
  3. Rework the logic to operate line by line (quite feasible and efficient, though not as brief in code as loading it all into a two-dimensional structure and transposing it and using min and max may be).

Here is my technique for the third approach:

maxima = [float('-inf')] * 13
minima = [float('inf')] * 13
with open('names.csv') as input_file:
    for row in input_file:
        for col, value in row.split(','):
            value = float(value)
            maxima[col] = max(maxima[col], value)
            minima[col] = min(minima[col], value)

# This gets the value you called ``l``
combined_max_and_min = zip(maxima, minima)

2 Comments

My only beef with this is that you essentially hard-code the number of columns whereas my version of this implementation doesn't hard-code the number of columns. (also, no need to row.strip() as float doesn't care about whitespace).
@mgilson: it could be handled without that hardcoding, but it's uglier. For reference, that would be something like float('-inf') if col > len(maxima) else maxima[col]. Either that or duplicating the row-reading code. All in all, I think hard-coding the value will typically (though not always) be more suitable. As for the row.strip() business, I thought about it but left it in through laziness. You have now prompted me to remove it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.