How to select multiple rows that have certain values in numpy?

Question

I have a numpy array that looks like this:

array([(1596207300,   1), (1596207300,  35), (1596207300,  36),
       (1596207300,  41), (1596207300,  42), (1596207300,  44),
       (1596207300,  49), (1596207300,  50), (1596207300,  51),
       (1596207300,  60), (1596207300,  68), (1596207300,  69),
       (1596207300,  81), (1596207300,  88), (1596207300,  96),
       (1596207300, 115), (1596207300, 118), (1596207300, 123),
       (1596207300, 125), (1596207300, 127), (1596207300, 128),
       (1596207300, 129), (1596207300, 147), (1596207300, 150),
       (1596207300, 156), (1596207300, 158), (1596207300, 162),
       (1596207300, 164), (1596207300, 165), (1596207300, 170),
       (1596207300, 171), (1596207300, 172), (1596207300, 173),
       (1596207300, 188), (1596207300, 189), (1596207300, 202),
       (1596207300, 241), (1596207300, 255), (1596207300, 257),
       (1596207300, 258), (1596207300, 260), (1596207300, 275),
       (1596207300, 276), (1596207300, 277), (1596207300, 278),
       (1596207300, 279), (1596207300, 280), (1596207300, 283),
       (1596207300, 285), (1596207300, 287), (1596207300, 296),
       (1596207300, 301), (1596207300, 302), (1596207300, 303),
       (1596207300, 313), (1596207300, 315), (1596207300, 316),
       (1596208200, 321), (1596208200, 322), (1596208200, 323),
       (1596208200, 348), (1596208200, 350), (1596208200, 352),
       (1596208200, 360), (1596208200, 370), (1596208200, 371),
       (1596208200, 373), (1596208200, 379), (1596208200, 380),
       (1596212220, 389), (1596212220, 391), (1596212220, 392)],
      dtype={'names':['time','value'], 'formats':['<u4','<u4'], 'offsets':[0,16], 'itemsize':20})

time column consists of timestamps (by minute). I want to extract rows with the biggest value per each time.

By [ arr[ arr['time'] == uTime ]['value'].max() for uTime in np.unique( arr['time'] ) ], I could get the biggest values per each time, which are [316, 380, 392], but I don't know how to simply extract the entire rows that contain the values.

The result I want to get:

array([(1596207300, 316), (1596208200, 380), (1596212220, 392)], dtype={'names':['time','value'], 'formats':['<u4','<u4'], 'offsets':[0,16], 'itemsize':20})

You're using NumPy for the wrong thing; I think that Pandas would be better suited to what you want to do. NumPy is optimized for linear algebra, and it looks like you need a library that performs relational algebra. — J. Nolan Faught
– J. Nolan Faught, Commented Nov 19, 2020 at 4:33
@Nolan Faught Thank you for the comment. Using Pandas will be easier. The thing is that there are many arrays to process and I want to use Numba and Numpy together for that. — maynull
– maynull, Commented Nov 19, 2020 at 4:49

Muslimbek Abduganiev · Accepted Answer · 2020-11-19 05:32:58Z

2

You almost got what you want. Just add uTime to the array construction:

[ [uTime, arr[ arr['time'] == uTime ]['value'].max()] for uTime in np.unique( arr['time']

Update
If you want the entire row to be in the result, I would suggest iterating manually. The following code works if timestamps come sequentially.

cols = {"time":0, "value":1, ...}
time_ = None
res = []
mx_row = arr[0]
for row in arr:

    if time_ == None:
        time_ = row[cols["time"]]

    if time_ != row[cols["time"]]:
        res.append(mx_row)
        time_ = None

    mx_row = max(mx_row, row, key=lambda x: x[cols["value"]])

If the data is not sorted, you might want to sort it according to the timestamp.

edited Nov 19, 2020 at 5:32

answered Nov 19, 2020 at 4:40

Muslimbek Abduganiev

9519 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

maynull Over a year ago

Thank you for the answer. My bad. There are other columns but I excluded them to simplify the question. Do you happen to know how to mask the original array and get the result?

sharathnatraj · Accepted Answer · 2020-11-19 07:39:28Z

1

Here is one way to do this:

n = np.unique( arr['time'] )
l = [ arr[ arr['time'] == uTime ]['value'].max() for uTime in n ]
arr[(np.in1d(arr['time'], n)) & ((np.in1d(arr['value'], l)))]

Prints:

array([(1596207300, 316), (1596208200, 380), (1596212220, 392)],
      dtype={'names':['time','value'], 'formats':['<u4','<u4'], 'offsets':[0,16], 'itemsize':20})

The first two lines are the same thing that you did. I just used that code to create two 1d lists of unique 'times' and their corresponding max 'values'. Then used np.1d to mask the original array as you require.

edited Nov 19, 2020 at 7:39

answered Nov 19, 2020 at 7:26

sharathnatraj

1,6147 silver badges14 bronze badges

Collectives™ on Stack Overflow

How to select multiple rows that have certain values in numpy?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related