0

I have a text file that looks like this; the values are tab separated:

diamond orange pear loc1  .  +    0.0  0.0  0.0  0.0  1.0 1.2  3.4 
diamond orange pear loc2  .  +    1.0  0.0  0.0  0.0  1.0 1.2  2.3
diamond orange pear loc3  .  +    2.0  0.0  3.0  0.0  0.0 0.0  1.4  
# ......

For each line in the file I want to make a ratio of the sum of the first 3 values divided by the sum of the last 4 values. The output would look like:

diamond orange pear loc1  .  +    0 
diamond orange pear loc2  .  +    0.22
diamond orange pear loc3  .  +    4.28 
 ......

I would like to do this in python.

with open('/path/to/file/') as inFile:
    inFile.next()
    for line in inFile:
        data = cols[6:]
        data = map(float,data)

        sum_3 = [sum[for x in x data[0:3]]
        sum_last = [sum[for x in x data[4:7]]
        average = sum_3/sum_last 

This doesn't work, and I was hoping if I could get some advice?

6
  • Is this Python 2 or 3? Do you have a minimal version requirement? Commented Apr 26, 2018 at 14:59
  • 1
    Why are you trying to call the sum function using square brackets? Commented Apr 26, 2018 at 15:04
  • 1
    The sum syntax is all wrong. [sum[for x in x data[0:3]] should be sum(x for x in data[0:3]), or rather sum(float(x) for x in data[0:3]) Commented Apr 26, 2018 at 15:05
  • 1
    @tobias_k: given that data is the result of a map(float, ...) call, I don't think the additional float() conversion is necessary there. :-P Commented Apr 26, 2018 at 15:11
  • @MartijnPieters Right, I missed that line. Commented Apr 26, 2018 at 15:12

1 Answer 1

3

You don't show where cols comes from, but it appears you didn't actually split each line, in which case you are left with is a single string and you were trying to work with the characters of that string, without the first 6. Mapping individual characters to float values is not going to give you the data you need.

Next, sum() is a function, but you are using indexing syntax, sum[...] will throw an exception. You don't need to use a list comprehension for getting values out of a slice either, just sum(data[:3]) would do, provided the slice produces a sequence of floats.

You have a tab-delimited file, it is probably easiest to just use the csv module to do the splitting:

import csv

with open('/path/to/file/') as infile:
    reader = csv.reader(infile, delimiter='\t')
    next(reader)  # skip first row

    for row in reader:
        first3, last = row[-7:-4], row[-4:]
        try:
            average = sum(map(float, first3)) / sum(map(float, last))
        except ZeroDivisionError:
            # last four values are all zero; just set the average to zero.
            average = 0

I've made allowances for the last 4 values all being zero; at that point you'd be dividing by zero and you'd want to handle the exception that is thrown for that case.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you - I am struggling to see what will happen to the first 6 descriptor columns for each row in your solution?
@AlexTrevylan Put a print(row) call inside the for loop to see what a row looks like. The csv reader splits your row data up into columns. It's up to you to grab the columns you want from each row and to append the calculated average. And then you can use a csv writer to write the updated row data to a new file.
@AlexTrevylan: they are still there in row, untouched. You could use those directly with row[:6]. I just didn't want to make assumptions about the number of columns in each row.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.