Compare column name, then compare row data in Python

Question

So what I am trying to do is I have a csv file that looks like this:

"test_name", "Mean", "Median", "Std_Dev"
"Data Name 1", 50, 75, 10
"Data Name 2", 52, 80, 11
"Data Name 1", 53, 79, 9 
"Data Name 2", 55, 78, 8
"Data Name 3", 54, 77, 7
"Data Name 3", 53, 71, 7
"Data Name 1", 51, 72, 8

So right now, I have a program that finds if the test name is equal to each other. Because if they have the same Data Name, I want to compare the data they have.

import csv

csvfile = 'some.csv'

data = {}

with open('some.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        for (k,v) in row.items():
                try:
                        data[k].append(v)
                except KeyError:
                        data[k] = [v]

testNames = data['test_name']
mean = data['Mean']
median = data['Median']
std = data['Stdev']

for val in testNames:
        for val2 in testNames:
                if val == val2:
                    index = testNames.index(val)
                    index2 = testNames.index(val2)

                    medianTemp = median[index]
                    medianTemp2 = median[index2]

                    if medianTemp2 > medianTemp:
                            sub = medianTemp2 - medianTemp
                            if sub > 100:
                                    print "Uh oh! @ ", val, "and ", val2 names only

Maybe, I'm doing something a little far off here. I am just looking to compare the medians of the data that has the same test name. I am struggling with being able to get the row data comparison after I have already compared the test_names. I have that part working.

******* EDIT ********* I am trying to use index() to find the element location now.

Now the issue that I am having is the index and index2 are the exact same value. Rather than Data Name 1 giving index 0 and the next Data Name 1 giving an index2 of 2. They both give 0.

Any suggestions are greatly appreciated.

Thanks :)

Nelson Yeung · Accepted Answer · 2017-08-14 17:35:31Z

1

You can loop over the index of testNames instead then use the indices to access row data

for i in range(len(testNames)):
    for j in range(len(testNames)):
        if i != j and testNames[i] == testNames[j]:
            # access row data using: data['Median'][i] data['Median'][j]

edited Aug 14, 2017 at 17:35

answered Aug 14, 2017 at 17:28

Nelson Yeung

3,4723 gold badges21 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

hiquetj Over a year ago

I editted my above question, I'm trying to use indexing now instead. But getting the issue of the same index being used for both index and index2

Nelson Yeung Over a year ago

You can just check whether they are the same or not. I've updated my answer with i != j and ....

Nelson Yeung Over a year ago

@hiquetj using the .index() function will only return you the first occurrence. I'd recommend you trying out my solution.

hiquetj Over a year ago

Ah ok, I'll give it a try. Working on it now

fuglede · Accepted Answer · 2017-08-14 17:24:56Z

0

While this may not be exactly what you are aiming to do, you may want to be aware that the pandas library is tailor-made for tasks like this; here, you would group your rows by test_name and perform whatever aggregatation you might be interested in. If, for instance, you are interested in the minimum and maximum median in each group, you would do the following:

In [1]: import pandas as pd

In [2]: df = pd.read_csv('some.csv')

In [3]: df
Out[3]:
     test_name   "Mean"   "Median"   "Std_Dev"
0  Data Name 1       50         75          10
1  Data Name 2       52         80          11
2  Data Name 1       53         79           9
3  Data Name 2       55         78           8
4  Data Name 3       54         77           7
5  Data Name 3       53         71           7
6  Data Name 1       51         72           8

In [4]: df.groupby('test_name')[' "Median"'].agg([min, max])
Out[4]:
             min  max
test_name
Data Name 1   72   79
Data Name 2   78   80
Data Name 3   71   77

answered Aug 14, 2017 at 17:24

fuglede

18.3k3 gold badges62 silver badges107 bronze badges

1 Comment

hiquetj Over a year ago

Yea i wish i could use pandas, but unfortunately the virtual machine I am running this off of doesn't have this installed (nor can I install it :( )

Collectives™ on Stack Overflow

Compare column name, then compare row data in Python

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related