1

I have a numpy array that looks like this:

array([(1596207300,   1), (1596207300,  35), (1596207300,  36),
       (1596207300,  41), (1596207300,  42), (1596207300,  44),
       (1596207300,  49), (1596207300,  50), (1596207300,  51),
       (1596207300,  60), (1596207300,  68), (1596207300,  69),
       (1596207300,  81), (1596207300,  88), (1596207300,  96),
       (1596207300, 115), (1596207300, 118), (1596207300, 123),
       (1596207300, 125), (1596207300, 127), (1596207300, 128),
       (1596207300, 129), (1596207300, 147), (1596207300, 150),
       (1596207300, 156), (1596207300, 158), (1596207300, 162),
       (1596207300, 164), (1596207300, 165), (1596207300, 170),
       (1596207300, 171), (1596207300, 172), (1596207300, 173),
       (1596207300, 188), (1596207300, 189), (1596207300, 202),
       (1596207300, 241), (1596207300, 255), (1596207300, 257),
       (1596207300, 258), (1596207300, 260), (1596207300, 275),
       (1596207300, 276), (1596207300, 277), (1596207300, 278),
       (1596207300, 279), (1596207300, 280), (1596207300, 283),
       (1596207300, 285), (1596207300, 287), (1596207300, 296),
       (1596207300, 301), (1596207300, 302), (1596207300, 303),
       (1596207300, 313), (1596207300, 315), (1596207300, 316),
       (1596208200, 321), (1596208200, 322), (1596208200, 323),
       (1596208200, 348), (1596208200, 350), (1596208200, 352),
       (1596208200, 360), (1596208200, 370), (1596208200, 371),
       (1596208200, 373), (1596208200, 379), (1596208200, 380),
       (1596212220, 389), (1596212220, 391), (1596212220, 392)],
      dtype={'names':['time','value'], 'formats':['<u4','<u4'], 'offsets':[0,16], 'itemsize':20})

time column consists of timestamps (by minute). I want to extract rows with the biggest value per each time.

By [ arr[ arr['time'] == uTime ]['value'].max() for uTime in np.unique( arr['time'] ) ], I could get the biggest values per each time, which are [316, 380, 392], but I don't know how to simply extract the entire rows that contain the values.

The result I want to get:

array([(1596207300, 316), (1596208200, 380), (1596212220, 392)], dtype={'names':['time','value'], 'formats':['<u4','<u4'], 'offsets':[0,16], 'itemsize':20})
2
  • 2
    You're using NumPy for the wrong thing; I think that Pandas would be better suited to what you want to do. NumPy is optimized for linear algebra, and it looks like you need a library that performs relational algebra. Commented Nov 19, 2020 at 4:33
  • 1
    @Nolan Faught Thank you for the comment. Using Pandas will be easier. The thing is that there are many arrays to process and I want to use Numba and Numpy together for that. Commented Nov 19, 2020 at 4:49

2 Answers 2

2

You almost got what you want. Just add uTime to the array construction:

[ [uTime, arr[ arr['time'] == uTime ]['value'].max()] for uTime in np.unique( arr['time']

Update
If you want the entire row to be in the result, I would suggest iterating manually. The following code works if timestamps come sequentially.

cols = {"time":0, "value":1, ...}
time_ = None
res = []
mx_row = arr[0]
for row in arr:

    if time_ == None:
        time_ = row[cols["time"]]

    if time_ != row[cols["time"]]:
        res.append(mx_row)
        time_ = None

    mx_row = max(mx_row, row, key=lambda x: x[cols["value"]])

If the data is not sorted, you might want to sort it according to the timestamp.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the answer. My bad. There are other columns but I excluded them to simplify the question. Do you happen to know how to mask the original array and get the result?
1

Here is one way to do this:

n = np.unique( arr['time'] )
l = [ arr[ arr['time'] == uTime ]['value'].max() for uTime in n ]
arr[(np.in1d(arr['time'], n)) & ((np.in1d(arr['value'], l)))]

Prints:

array([(1596207300, 316), (1596208200, 380), (1596212220, 392)],
      dtype={'names':['time','value'], 'formats':['<u4','<u4'], 'offsets':[0,16], 'itemsize':20})

The first two lines are the same thing that you did. I just used that code to create two 1d lists of unique 'times' and their corresponding max 'values'. Then used np.1d to mask the original array as you require.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.