How to select the elements based on a list in numpy array?

Question

I have a dataframe like this:

array([[1374495, 3, 'prior', ..., 16.0, 'soy lactosefree', 'dairy eggs'],
       [3002854, 3, 'prior', ..., 16.0, 'soy lactosefree', 'dairy eggs'],
       [2710558, 3, 'prior', ..., 16.0, 'soy lactosefree', 'dairy eggs'],
       ...,
       [1355976, 206200, 'prior', ..., 16.0, 'soy lactosefree',
        'dairy eggs'],
       [1909878, 206200, 'prior', ..., 16.0, 'soy lactosefree',
        'dairy eggs'],
       [943915, 206200, 'train', ..., 16.0, 'soy lactosefree', 'dairy eggs']], dtype=object)

the first number of every row is orderid, like 1374495, 3002854, 2710558... Now I have a list of orderid which shall be used to get the rows from the array. For example, the list to be used is [1355976, 1909878, 943915 ], I should select the rows from array whose orderid in [1355976, 1909878, 943915 ]. How can I realize this in an efficient way ?

Divakar · Accepted Answer · 2017-06-29 07:04:37Z

Approach #1

Here's one approach based on np.searchsorted -

def filter_rows(a, idx):
    # a is input dataframe as array
    # idx is list of indices for selecting rows

    a_idx = a[:,0]
    idx_arr = np.sort(idx)
    pos_idx = np.searchsorted(idx_arr, a_idx)
    pos_idx[pos_idx == idx_arr.size] = 0
    mask = idx_arr[pos_idx] == a_idx
    out = a[mask]
    return out

Approach #2

Here's another with np.in1d -

a[np.in1d(a[:,0], idx)]

Sample runs -

In [83]: a
Out[83]: 
array([[1374495, 3, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [3002854, 3, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [2710558, 3, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [1355976, 206200, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [1909878, 206200, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [943915, 206200, 'train', 16.0, 'soy lactosefree', 'dairy eggs']])

In [84]: idx
Out[84]: [1355976, 1909878, 943915]

In [85]: filter_rows(a, idx)
Out[85]: 
array([[1355976, 206200, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [1909878, 206200, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [943915, 206200, 'train', 16.0, 'soy lactosefree', 'dairy eggs']])

In [88]: a[np.in1d(a[:,0], idx)]
Out[88]: 
array([[1355976, 206200, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [1909878, 206200, 'prior', 16.0, 'soy lactosefree', 'dairy eggs'],
       [943915, 206200, 'train', 16.0, 'soy lactosefree', 'dairy eggs']])

Eelco Hoogendoorn · Accepted Answer · 2017-06-29 07:27:17Z

0

The numpy_indexed package (disclaimer: I am its author) contains efficient functionality for these type of operations:

import numpy_indexed as npi
row_idx = npi.indices(id_column, ids_to_get_index_of)

Should have the same performance as the solution offered by Divakar, but comes with some extra bells and whistles, like kwargs to select various ways of dealing with missing values, and so on.

answered Jun 29, 2017 at 7:27

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Collectives™ on Stack Overflow

How to select the elements based on a list in numpy array?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related