0

I have a set of textiles that I read using np.genfromtext. Usually they are in a standard format, one text file for each plate measured, with each plate having 300 holes. This gives me headers of:

headers =['ID','Diameter','Radius','Xpos','Ypos']
#the data looks like
[1,105,53.002,784.023,91.76],
[2,104,51.552,787.023,91.71],
...
[300,104,51.552,787.023,91.71]

Now I have a set of textiles that instead of one measurement per hole for a plate are measuring one hole twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[2,104,51.552,787.023,91.71],
[2,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

or one in every two holes twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[3,104,51.552,787.023,91.71],
[3,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

or 1 in three holes twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[4,104,51.552,787.023,91.71],
[4,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

What I would like is one method of taking the first value in each row, the 'ID' and based on that be able to take an average of how ever many rows have that same ID and then proceed with the rest of my code to analyse the results.

This is how I usually read in the 1 of 1 data:

dataA=np.genfromtxt(fname,dtype=float, delimiter='\t', names=True)

And this line works fine if every textile had a duplicate row or second measurement:

lines = open( 'filename.txt', "r" ).readlines()[::2]

Any ideas on how to get a unique array as an output with no duplications of ID, ideally averages of the rows with the same ID but unique rows may suffice?

8
  • 1
    What do you mean by "1 of 1 data"? And why are you skipping the duplicates if you want to "take an average of how ever many rows have that same ID"? Commented Jul 23, 2019 at 14:04
  • 1
    Also please clearly provide an expected input and output. Would you want the code to work in all four cases or just one? Commented Jul 23, 2019 at 14:06
  • Hi, 1 of 1 means 1 measurement of 1 hole, 1 of 3 means 1 measurement and then skip the next two holes etc. I would prefer to have averages of the rows with same ID or if thats not possible to skip the measurements after the first one with that ID. Commented Jul 23, 2019 at 14:10
  • 1
    That "output" is the same as the input. Didn't you say you wanted averages? Commented Jul 23, 2019 at 14:13
  • 2
    Possible duplicate of Group and Average Numpy Matrix Commented Jul 23, 2019 at 14:34

1 Answer 1

1

You can use below code. This will not average but you get rid of duplicate index values.

import numpy as np
a = np.array([[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]])
a[np.unique(a[:,0],return_index=True,axis=0)[1]]
Sign up to request clarification or add additional context in comments.

2 Comments

Hi Pritesh, when I tried it with import numpy as np a=[[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]] a[np.unique(a[:,0],return_index=True,axis=0)[1]] print (a) I got "TypeError: list indices must be integers or slices, not tuple" as error message, not sure what I've done wrong here.
sorry that I only wrote final solution. import numpy as np a = np.array([[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]]). We are using numpy.unique so your data must be in numpy array. Numpy module cannot process list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.