1

I have a csv file like the following (test.csv) with two columns.

338,800
338,550
339,670
340,600 
327,500
301,430
299,350
284,339
284,338
283,335
283,330
283,310
282,310
282,300
282,300
283,290

From column 1, I wanted to read current row and compare it with the value of the previous row. If it is greater OR equal, continue comparing and if the value of the current cell is smaller than the previous row - then i wanted to the value of the second column in the same row.

Next I wanted to divided the larger value we got in column 1 by the value in the same cell of column two. Let me make it clear.

For example in the table given above: the smaller value we will get depending on my requirement from Column 1 is 327 (because 327 is smaller than the previous value 340) - and then we take 500 (which is the corresponding cell value on column 2). Finally we divide 340 by 500 and get the value 0.68. My python script should exit right after we print the value to the console.

Currently, I am using the following script in bash, and it works fine

awk -F, '$1<p && $2!=0{ 
val=$2/p    
if(val>=0.8 && val<=0.9)
    {
        print "A"
    }
else if(val==0.7)
    {
        print "B"
    }
else if(val>=0.5 && val <0.7)
    {
        print "C" 

    }
else if(val==0.5)
    {
        print "E"
    }
else
    {
        print "D" 
    }
exit
}
{ 
    p=$1 
}' test.csv

but I wanted to do it with python and i would appreciate for any help. Here is my approach

import csv

f = open("test.csv", "r+")
ff = csv.reader(f)

previous_line = ff.next()
while(True):
    try:
        current_line = ff.next()
        if previous_line <= current_line:
            print "smaller value"
    except StopIteration:
        break
2
  • 1
    You should specify which exception you're expecting to get so that if you get a different one the program will still return useful information: for example except StopIteration: Commented May 8, 2017 at 12:42
  • oh, forget it. OK, done. Commented May 8, 2017 at 12:47

3 Answers 3

3

I recommend you use csv.Reader's built-in iteration rather than calling .next() directly. And your code should not test normal floats for equality. In any language, that's not just a Python thing. Also, a calculated value of 0.79 will result in D which may not be what you intend.

from __future__ import division
import csv

def category(val):
    if 0.8 < val <= 0.9:
        return "A"
    #Note: don't test val == 0.7: you should never test floats for equality
    if abs(val - 0.7) < 1e-10:
        return "B"
    if 0.5 < val < 0.7:
        return "C"
    if abs(val - 0.5) < 1e-10:
        return "E"
    return "D"

with open(r"E:\...\test.csv", "r") as csvfile:
    ff = csv.reader(csvfile)

    previous_value = 0
    for col1, col2 in ff:
        if not col1.isdigit():
            continue
        value = int(col1)
        if value >= previous_value:
            previous_value = value
            continue
        else:
            result = previous_value / int(col2)
            print category(result)
            break

Edit in response to a change to the OP's request

from __future__ import division
import csv

def category(val):
    if 0.8 < val <= 0.9:
        return "A"
    #Note: don't test val == 0.7: you should never test floats for equality
    if abs(val - 0.7) < 1e-10:
        return "B"
    if 0.5 < val < 0.7:
        return "C"
    if abs(val - 0.5) < 1e-10:
        return "E"
    return "D"

with open(r"E:\...\test.csv", "r") as csvfile:
    ff = csv.reader(csvfile)

    results = []
    previous_value = 0
    for col1, col2 in ff:
        if not col1.isdigit():
            continue
        value = int(col1)
        if value >= previous_value:
            previous_value = value
            continue
        else:
            result = previous_value / int(col2)
            results.append(result)
            print category(result)
            previous_value = value
    print (results)
    print (sum(results))
    print (category(sum(results) / len(results)))

I've had to guess at the logic you want to use for resetting the previous value because your original had the loop break at the first result. And I've no idea how you want end-of-file handled. This revision produces the following output:

C
D
A
A
A
D
[0.68, 0.7604651162790698, 0.86, 0.8820058997050148, 0.8477611940298507, 0.9129032258064517]
4.94313543582
A

As you can see, there are definitely more than two values in results.

Sign up to request clarification or add additional context in comments.

9 Comments

How about if we want to sum or do average of result? I tried it like this: sum_value = sum(result) but i found the following erro: TypeError: 'float' object is not iterable
sum_value = sum(result) won't work because result is not a list of floats, it is a single value. It gets freshly assigned in the for loop on each iteration. To do what you want, store each calculated value of result in a list called results then call sum(results) before break.
Store each calculated value in results. What results=[result] does is store the last calculated value. You need results.append(result), and somewhere earlier in your program, before the loop, you need to set up results as an empty list: results = [], otherwise you will get a NameError.
category() is the function that turns your number into a letter. If you just want the value, then print ("Results:%.2f" %(sum(results) / len(results)))
@Brown Sorry, I don't consider myself sufficiently experienced to answer numpy questions.
|
0
col_1 = []
col_2 = []
with open("test.csv", "r+") as f:
    for elem in f.readlines():
        col_1.append(float(elem.split(",")[0]))
        col_2.append(float(elem.split(",")[1]))

condition = True
i=0
while condition:
    if (col_1[i+1]-col_1[i]<0):
        print col_1[i]/col_2[i+1]
        condition = False
    i+=1

Comments

0

If it is a .csv file working with pandas could give you more control.

import numpy as np
import pandas as pd

pd.read_csv("filename.csv") # to read a .csv file into a dataframe

However, for this example I am not using pd.read_csv() function. Instead, I am creating a dataframe from a 2D numpy array like so,

dataframe = pd.DataFrame(np.array([[338,800],
    [338,550],
    [339,670],
    [340,600], 
    [327,500],
    [301,430],
    [299,350],
    [284,339],
    [284,338],
    [283,335],
    [283,330],
    [283,310],
    [282,310],
    [282,300],
    [282,300],
    [283,290]]))

Now that I have a dataframe object, I can manipulate it just like other object types in python. I can call pandas specific functions to work on the dataframe for the results I want.

def compare_and_divide(df):
    for i in range(len(df)-1):
        # df[0] for all values in col 0 .iloc[i] for value in row
        if df[0].iloc[i+1] >= df[0].iloc[i]:                                 
            continue     
        else:
            df[0].iloc[i+1] = df[0].iloc[i]

    return df[0].div(df[1]) # .div() function to divide values in col 0 by col 1

compare_and_divide(dataframe)   

0     0.422500
1     0.614545
2     0.505970
3     0.566667
4     0.680000 # 340/500 value mentioned in the question
5     0.790698
6     0.971429
7     1.002950
8     1.005917
9     1.014925
10    1.030303
11    1.096774
12    1.096774
13    1.133333
14    1.133333
15    1.172414
dtype: float64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.