How to get averages of rows in a 2D numpy array or text file

Question

I have a set of textiles that I read using np.genfromtext. Usually they are in a standard format, one text file for each plate measured, with each plate having 300 holes. This gives me headers of:

headers =['ID','Diameter','Radius','Xpos','Ypos']
#the data looks like
[1,105,53.002,784.023,91.76],
[2,104,51.552,787.023,91.71],
...
[300,104,51.552,787.023,91.71]

Now I have a set of textiles that instead of one measurement per hole for a plate are measuring one hole twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[2,104,51.552,787.023,91.71],
[2,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

or one in every two holes twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[3,104,51.552,787.023,91.71],
[3,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

or 1 in three holes twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[4,104,51.552,787.023,91.71],
[4,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

What I would like is one method of taking the first value in each row, the 'ID' and based on that be able to take an average of how ever many rows have that same ID and then proceed with the rest of my code to analyse the results.

This is how I usually read in the 1 of 1 data:

dataA=np.genfromtxt(fname,dtype=float, delimiter='\t', names=True)

And this line works fine if every textile had a duplicate row or second measurement:

lines = open( 'filename.txt', "r" ).readlines()[::2]

Any ideas on how to get a unique array as an output with no duplications of ID, ideally averages of the rows with the same ID but unique rows may suffice?

What do you mean by "1 of 1 data"? And why are you skipping the duplicates if you want to "take an average of how ever many rows have that same ID"? — Akaisteph7
– Akaisteph7, Commented Jul 23, 2019 at 14:04
Also please clearly provide an expected input and output. Would you want the code to work in all four cases or just one? — Akaisteph7
– Akaisteph7, Commented Jul 23, 2019 at 14:06
Hi, 1 of 1 means 1 measurement of 1 hole, 1 of 3 means 1 measurement and then skip the next two holes etc. I would prefer to have averages of the rows with same ID or if thats not possible to skip the measurements after the first one with that ID. — Windy71
– Windy71, Commented Jul 23, 2019 at 14:10
That "output" is the same as the input. Didn't you say you wanted averages? — Akaisteph7
– Akaisteph7, Commented Jul 23, 2019 at 14:13

Pritesh Gohil · Accepted Answer · 2019-12-17 08:07:20Z

1

You can use below code. This will not average but you get rid of duplicate index values.

import numpy as np
a = np.array([[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]])
a[np.unique(a[:,0],return_index=True,axis=0)[1]]

edited Dec 17, 2019 at 8:07

answered Jul 23, 2019 at 14:36

Pritesh Gohil

4767 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Windy71 Over a year ago

Hi Pritesh, when I tried it with import numpy as np a=[[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]] a[np.unique(a[:,0],return_index=True,axis=0)[1]] print (a) I got "TypeError: list indices must be integers or slices, not tuple" as error message, not sure what I've done wrong here.

Pritesh Gohil Over a year ago

sorry that I only wrote final solution. import numpy as np a = np.array([[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]]). We are using numpy.unique so your data must be in numpy array. Numpy module cannot process list.

Collectives™ on Stack Overflow

How to get averages of rows in a 2D numpy array or text file

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related