finding values in numpy array of floats

Question

I am trying to do a data analysis, where I import data as numpy array of floats, where some values are below 0. Then I select a column named load and I want to find an indices where the values are >0.1

However I am getting an error: "only integers, slices (\`:\`), ellipsis (\`...\`), numpy.newaxis (\`None\`) and integer or boolean arrays are valid indices"

what am I doing wrong please?


import numpy as np
import pandas as pd

data=pd.read_csv('C1.txt',delim_whitespace=True , 
                 skiprows=10, skip_blank_lines=True ) 
data_array=data.to_numpy()
load=data_array[10:,1]

res=list()
for idx in load:
    if load[idx] > 0.1:
        res.append(idx)

i need to find indices in array where values are over 0.1

the start of the data array looks like this:

0.063   -0.00174    0.063   -0.00075
0.094   0.00628 0.094   -0.00089
0.125   0.01292 0.125   -0.00111
0.156   -0.00027    0.156   0.00015
0.188   -0.00319    0.188   0.00108
0.219   -0.00733    0.219   -0.0007
0.25    -0.02446    0.25    -0.00074
0.281   -0.01493    0.281   -0.00078
0.313   0.01339 0.313   0.00019

Please, include a minimal reproducible example. We don't have your files (and don't want them, it is not needed). See my edit — chrslg
– chrslg, Commented Jul 21, 2023 at 12:57

chrslg · Accepted Answer · 2023-07-21 13:26:13Z

2

This is done by indexing with a mask.

With numpy, since that is what you've asked for

np.where(load>0.1)[0]

[0] because it returns a tuple.

Note that I've ignored the 10: part of load since I don't know why you want to skip the 1st 10 rows. But what is certain is that you cannot index an array of 100 values with 90 booleans. And same would goes for your method (a for loop). You can't iterate subarray [10:,1] and expect to find index consistent with the full array.

So, if, for some reason you just want to ignore the 1st 10 rows (for example because you know that in your file they are just rubish), then

load=data_array[10:,1]
np.where(load>0.1)[0]+10

Here I add 10, to take into account the fact that index is on subarray starting at 10

With pandas directly

data.index[data.load>0.1]

Explanation: data.load is the load column. data.load>0.1 is an series of boolean (so here, 100 boleans, since there are 100 rows; with the same index), True iff the load field of the corresponding row is >0.1. And data.index[data.load>0.1] is a the index column whose rows are only those for which data.load>0.1 is True.

In pure python

So with your method, once corrected

for i in range(len(load)):
    if load[i]>0.1: res.append(i)

If you really insist on avoiding the iteration with a range,

for i,v in enumerate(load):
    if v>0.1: res.append(i)

Or, using compound list

res=[i for i,v in enumerate(load) if v>0.1]

But that is not a good idea. The whole point of numpy/pandas is to avoid doing pure python for loops, and to get numpy/pandas do the for loop (therefore in C) for you. Of course, under the hood, what my numpy or pandas solution do is more or less the same for loop I do here. But they do it in C. So, some 1000 or even more times faster.

edited Jul 21, 2023 at 13:26

answered Jul 21, 2023 at 13:00

chrslg

15.2k11 gold badges26 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Igor Moravčík Over a year ago

thank you very much for this explanation it helps me to learn a lot...

Igor Moravčík Over a year ago

but for example how to you find out that there even is a functon for this like "where"? surelly you dont go throuh the whole documentaton

chrslg Over a year ago

You don't know it on you first day, sure. But .where is a quite classical one, so hard to avoid seeing it after reading a while. For example, I've just typed numpy find index of boolean array in google right now. And first answer is this one. Whose first answer does mention np.where. Sure, in a different context. But from there, you know you can read documentation for np.where.

Igor Moravčík Over a year ago

thank you again for the answer I wish everyone would explain it as well as you. have a nice day

jared Over a year ago

@IgorMoravčík As it says on the note at the top, if only the condition is given (as is in this case), the function is shorthand for using nonzero. Looking at that documentation shows that it will return a tuple.

|

Muhammad Waqar Anwar · Accepted Answer · 2023-07-22 15:32:48Z

0

Edited:

idx here is not index but the item at that index. you can do the following to get the index:

for idx,item in enumerate(load):
    if item > 0.1:
        res.append(idx)

edited Jul 22, 2023 at 15:32

answered Jul 21, 2023 at 13:09

Muhammad Waqar Anwar

4406 silver badges15 bronze badges

2 Comments

chrslg Over a year ago

Note that hey do want the index, not the value (I made the same misreading initially). So that solutions works, in the sense that it doesn't crash. But returns a list a floating values, when they wanted a list of index of those floats in the initial array

Igor Moravčík Over a year ago

as you pointed out the code you mentioned does not wok for me, since I need indices rather than just value over 0.1. I want to use indices since I want t get rid of the first data that is below 0.1, however after that I want all data

Collectives™ on Stack Overflow

finding values in numpy array of floats

2 Answers 2

6 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related