1

My Data looks like this:

549  648.077  0.01
552  648.141  0.45
554  647.167  0.1
572  648.141  0.3
530  630.213  0.69
560  670.312  0.70

there are a few thousand lines in the file

the 1strow values range from 0-1100

the 2ndrow values range from 600-700

the 3rdrow values range from 0-1 I need to plot the data and therefore need to sort and modify the data:

I need to split the 3rd row values (normal range 0.0-1.0) into segments 0.0-0.20, 0.21-0.40, 0,41-0,60, 0.61-0.80,0.81-1.00

Next I need to split the segments from the 1strow (normal range (0-1100) into segments like 0-10, 11-20,21-30and so on up to 1100. What I want to do is find all 2ndrow values within a region 0.0-0.20 and 0-10 , 0.0-0.20 and 11-20,0.0-0.20 and 21-30.

When found I want to add them all together and divide the value by the number of appearances to get the mean value: so I want for a region between 0.0-0.20 and 0-10 one value. I'm fairly new to python and I think that this is some kind of approach:

import os                                                      
import csv                                                     
dataList = []                                                  
with open("table.csv") as csv_file:                         
data_reader = csv.reader(csv_file, dialect='excel-tab')        
for rows in data_reader:                                       
    if float(rows[2]) <= 0.20:                                 
        if float(rows[0]) <= 10:                               
           print(rows)                                       
        if 10 <float(rows[0])<=20:                            
            print (rows)

That should work (without the print of course) to get the values, repeated than for if 20<float(rows[2])<= 30: ..... That should bring me the values I want but is there an easy way to set a range going from 0-1100 in 10 units step?

P.S.: I am aware that I gave lots of Info for a relative short question and that's because I don't really know if python is the best way to do this and if my approach is reasonable? Maybe I should go with panda but that I just installed. So in case anyone knows an easier (maybe not coding related) way to solve a problem like this I'd really appreciate it.

1 Answer 1

1

You need a sorting key to put your data in the correct order; then you can cycle through it to group it and get your sums. itertools.groupby is excellent for this type of thing:

import os
import csv
from itertools import groupby

data_list = []
with open("table.csv") as csv_file:
    data_reader = csv.reader(csv_file, dialect='excel-tab')
    for row in data_reader:
        # read and convert data
        data_list.append((int(row[0]), float(row[1]), float(row[2])))

def classify(row):
    # returns classification of row based on third value, then first value
    # i.e. [549, 648.077, 0.01] -> (0.0, 540)
    #      [572, 648.141, 0.3]  -> (0.3, 570)
    first, second, third = row
    for level3 in (0.2, 0.4, 0.6, 0.8):
        if third <= level3:
            break
    else:
        level3 = 1.0
    for level1 in range(10, 1100, 10):
        if first <= level1:
            break
    else:
        level1 = 1100
    return level3, level1

data_list.sort(key=classify)

plot_values = []

for key, group in groupby(data_list, classify):
    group = list(group)
    number_of_elements = len(group)
    mean = sum([r[2] for r in group]) / number_of_elements
    plot_values.append((key, mean))

print('\n'.join(['%-10s -> %s' % (k, m) for k, m in plot_values]))

When run against the sample data you provided:

(0.2, 550) -> 0.01
(0.2, 560) -> 0.1
(0.4, 580) -> 0.3
(0.6, 560) -> 0.45
(0.8, 530) -> 0.69
(0.8, 560) -> 0.7
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very merry much that module is great: I only changed the mean = sum([r[2] for r in group]) into mean = sum([r[1] for r in group]) since I want the mean value from the second row data for each segment but it seems I did not mention that cleary. No it looks like it's working exactly the way it's supposed to work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.