0

I am trying to do a data analysis, where I import data as numpy array of floats, where some values are below 0. Then I select a column named load and I want to find an indices where the values are >0.1

However I am getting an error: "only integers, slices (\`:\`), ellipsis (\`...\`), numpy.newaxis (\`None\`) and integer or boolean arrays are valid indices"

what am I doing wrong please?


import numpy as np
import pandas as pd

data=pd.read_csv('C1.txt',delim_whitespace=True , 
                 skiprows=10, skip_blank_lines=True ) 
data_array=data.to_numpy()
load=data_array[10:,1]

res=list()
for idx in load:
    if load[idx] > 0.1:
        res.append(idx)

i need to find indices in array where values are over 0.1

the start of the data array looks like this:

0.063   -0.00174    0.063   -0.00075
0.094   0.00628 0.094   -0.00089
0.125   0.01292 0.125   -0.00111
0.156   -0.00027    0.156   0.00015
0.188   -0.00319    0.188   0.00108
0.219   -0.00733    0.219   -0.0007
0.25    -0.02446    0.25    -0.00074
0.281   -0.01493    0.281   -0.00078
0.313   0.01339 0.313   0.00019
2
  • Please, include a minimal reproducible example. We don't have your files (and don't want them, it is not needed). See my edit Commented Jul 21, 2023 at 12:57
  • I added the original data from the array to the question Commented Jul 21, 2023 at 13:52

2 Answers 2

2

This is done by indexing with a mask.

With numpy, since that is what you've asked for

np.where(load>0.1)[0]

[0] because it returns a tuple.

Note that I've ignored the 10: part of load since I don't know why you want to skip the 1st 10 rows. But what is certain is that you cannot index an array of 100 values with 90 booleans. And same would goes for your method (a for loop). You can't iterate subarray [10:,1] and expect to find index consistent with the full array.

So, if, for some reason you just want to ignore the 1st 10 rows (for example because you know that in your file they are just rubish), then

load=data_array[10:,1]
np.where(load>0.1)[0]+10

Here I add 10, to take into account the fact that index is on subarray starting at 10

With pandas directly

data.index[data.load>0.1]

Explanation: data.load is the load column. data.load>0.1 is an series of boolean (so here, 100 boleans, since there are 100 rows; with the same index), True iff the load field of the corresponding row is >0.1. And data.index[data.load>0.1] is a the index column whose rows are only those for which data.load>0.1 is True.

In pure python

So with your method, once corrected

for i in range(len(load)):
    if load[i]>0.1: res.append(i)

If you really insist on avoiding the iteration with a range,

for i,v in enumerate(load):
    if v>0.1: res.append(i)

Or, using compound list

res=[i for i,v in enumerate(load) if v>0.1]

But that is not a good idea. The whole point of numpy/pandas is to avoid doing pure python for loops, and to get numpy/pandas do the for loop (therefore in C) for you. Of course, under the hood, what my numpy or pandas solution do is more or less the same for loop I do here. But they do it in C. So, some 1000 or even more times faster.

Sign up to request clarification or add additional context in comments.

6 Comments

thank you very much for this explanation it helps me to learn a lot...
but for example how to you find out that there even is a functon for this like "where"? surelly you dont go throuh the whole documentaton
You don't know it on you first day, sure. But .where is a quite classical one, so hard to avoid seeing it after reading a while. For example, I've just typed numpy find index of boolean array in google right now. And first answer is this one. Whose first answer does mention np.where. Sure, in a different context. But from there, you know you can read documentation for np.where.
thank you again for the answer I wish everyone would explain it as well as you. have a nice day
@IgorMoravčík As it says on the note at the top, if only the condition is given (as is in this case), the function is shorthand for using nonzero. Looking at that documentation shows that it will return a tuple.
|
0

Edited:

idx here is not index but the item at that index. you can do the following to get the index:

for idx,item in enumerate(load):
    if item > 0.1:
        res.append(idx)

2 Comments

Note that hey do want the index, not the value (I made the same misreading initially). So that solutions works, in the sense that it doesn't crash. But returns a list a floating values, when they wanted a list of index of those floats in the initial array
as you pointed out the code you mentioned does not wok for me, since I need indices rather than just value over 0.1. I want to use indices since I want t get rid of the first data that is below 0.1, however after that I want all data

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.