1

Hi here is my problem. I have a program that calulcates the averages of data in columns. Example

Bob
1
2
3

the output is

Bob
2

Some of the data has 'na's So for Joe

Joe
NA
NA
NA

I want this output to be NA

so I wrote an if else loop

The problem is that it doesn't execute the second part of the loop and just prints out one NA. Any suggestions?

Here is my program:

with open('C://achip.txt', "rtU") as f:
    columns = f.readline().strip().split(" ")
    numRows = 0
    sums = [0] * len(columns)

    numRowsPerColumn = [0] * len(columns) # this figures out the number of columns

    for line in f:
        # Skip empty lines since I was getting that error before
        if not line.strip():
            continue

        values = line.split(" ")
        for i in xrange(len(values)):
            try: # this is the whole strings to math numbers things
                sums[i] += float(values[i])
                numRowsPerColumn[i] += 1
            except ValueError:
                continue 

    with open('c://chipdone.txt', 'w') as ouf:
        for i in xrange(len(columns)):
           if numRowsPerColumn[i] ==0 :
               print 'NA' 
           else:
               print>>ouf, columns[i], sums[i] / numRowsPerColumn[i] # this is the average calculator

The file looks like so:

Joe Bob Sam
1 2 NA
2 4 NA
3 NA NA
1 1  NA

and final output is the names and the averages

Joe Bob Sam 
1.5 1.5 NA

Ok I tried Roger's suggestion and now I have this error:

Traceback (most recent call last): File "C:/avy14.py", line 5, in for line in f: ValueError: I/O operation on closed file

Here is this new code:

with open('C://achip.txt', "rtU") as f: columns = f.readline().strip().split(" ") sums = [0] * len(columns) rows = 0 for line in f: line = line.strip() if not line: continue

rows += 1 for col, v in enumerate(line.split()): if sums[col] is not None: if v == "NA": sums[col] = None else: sums[col] += int(v)

with open("c:/chipdone.txt", "w") as out: for name, sum in zip(columns, sums): print >>out, name, if sum is None: print >>out, "NA" else: print >>out, sum / rows

3
  • Use "C:\\file" or "c:/file", with the latter usually preferred; Using "//" will be interpreted incorrectly in many cases (just not in this exact one). Commented Sep 24, 2010 at 14:59
  • Could you paste an example of what the source file looks like, and a sample of what the complete output should look like? Commented Sep 24, 2010 at 15:00
  • ...and also, could you include the code of the "second part of the loop"? The code provided only contains two alternative instructions (if/else)... Commented Sep 24, 2010 at 15:03

4 Answers 4

1
with open("c:/achip.txt", "rU") as f:
  columns = f.readline().strip().split()
  sums = [0.0] * len(columns)
  row_counts = [0] * len(columns)

  for line in f:
    line = line.strip()
    if not line:
      continue

    for col, v in enumerate(line.split()):
      if v != "NA":
        sums[col] += int(v)
        row_counts[col] += 1

with open("c:/chipdone.txt", "w") as out:
  for name, sum, rows in zip(columns, sums, row_counts):
    print >>out, name,
    if rows == 0:
      print >>out, "NA"
    else:
      print >>out, sum / rows

I'd also use the no-parameter version of split when getting the column names (it allows you to have multiple space separators).

Regarding your edit to include input/output sample, I kept your original format and my output would be:

Joe 1.75
Bob 2.33333333333
Sam NA

This format is 3 rows of (ColumnName, Avg) columns, but you can change the output if you want, of course. :)

Sign up to request clarification or add additional context in comments.

4 Comments

@Robert: The code you included in your edit is misindented with the for loop outside of the with, closing the file before the for loop runs. Updated my code to show what I mean.
@Robert: I also see that the code I wrote (before you included the example) is wrong, as I misinterpreted you. Fixed.
Still not working Roger. Now when i have a name like Joe 2 NA 1....the final value should be 1.5 and it outputs as NA
@Robert: Using 0.0 instead of 0 for sums (so floating point is used) and I get Joe 1.75, Bob 2.333.., Sam NA for the input sample you gave in the question. These values match what I figure out by hand.
0

Using numpy:

import numpy as np

with open('achip.txt') as f:
    names=f.readline().split()
    arr=np.genfromtxt(f)

print(arr)
# [[  1.   2.  NaN]
#  [  2.   4.  NaN]
#  [  3.  NaN  NaN]
#  [  1.   1.  NaN]]

print(names)
# ['Joe', 'Bob', 'Sam']

print(np.ma.mean(np.ma.masked_invalid(arr),axis=0))
# [1.75 2.33333333333 --]

Comments

0

Using your original code, I would add one loop and edit the print statement

    with open(r'C:\achip.txt', "rtU") as f:
    columns = f.readline().strip().split(" ")
    numRows = 0
    sums = [0] * len(columns)

    numRowsPerColumn = [0] * len(columns) # this figures out the number of columns

    for line in f:
        # Skip empty lines since I was getting that error before
        if not line.strip():
            continue

        values = line.split(" ")

        ### This removes any '' elements caused by having two spaces like
        ### in the last line of your example chip file above
        for count, v in enumerate(values):      
            if v == '':     
                values.pop(count)
        ### (End of Addition)

        for i in xrange(len(values)):
            try: # this is the whole strings to math numbers things
                sums[i] += float(values[i])
                numRowsPerColumn[i] += 1
            except ValueError:
                continue 

    with open('c://chipdone.txt', 'w') as ouf:
        for i in xrange(len(columns)):
           if numRowsPerColumn[i] ==0 :
               print>>ouf, columns[i], 'NA' #Just add the extra parts
           else:
               print>>ouf, columns[i], sums[i] / numRowsPerColumn[i]

This solution also gives the same result in Roger's format, not your intended format.

Comments

0

Solution below is cleaner and has fewer lines of code ...

import pandas as pd

# read the file into a DataFrame using read_csv
df = pd.read_csv('C://achip.txt', sep="\s+")

# compute the average of each column
avg = df.mean()

# save computed average to output file
avg.to_csv("c:/chipdone.txt")

They key to the simplicity of this solution is the way the input text file is read into a Dataframe. Pandas read_csv allows you to use regular expressions for specifying the sep/delimiter argument. In this case, we used the "\s+" regex pattern to take care of having one or more spaces between columns.

Once the data is in a dataframe, computing the average and saving to a file can all be done with straight forward pandas functions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.