Numpy Vectorize, Finding Indices of subarray with Element in it

Question

Problem:

I have a list of lists, e.g [[1,2,3],[1,4,5],[2,7,6,4]], and I want to find the indices corresponding to lists with a given element; If the element I'm interested in is 4, 4 is in both [1,4,5] and [2,7,6,4] so I expect to obtain the indices of these lists in the main list, that is [1,2]. (I'm most interested in a Python solution.)

Attempt: Inspired by Find indices in numpy arrays consisting of lists where element is in list, I intended to use numpy.vectorize as follows:

import numpy
list = [[1,2,3],[1,4,5],[2,7,6,4]]
part_of = numpy.vectorize(lambda: 4 in x)
np.where(part_of(list))

The above works just fine, but when I have a list of lists of the same length, for instance,

list = [[1,2,3],[1,4,5],[2,7,4]]

it seems to break down and I get an error saying that int is not iterable, so that it seems it's iterating on a deeper level than I intended. I can imagine an easy work around to this, if the problem is indeed the broadcasting rules and having lists of different sizes is indeed sufficient for it to work, i.e adding a list of a different size without the desired element, but I wonder if there is a neater way or if I'm missing something simple. On my actual application, there will be many, lists inside the main list, so if there are considerably more efficient solutions, these are very much welcomed as well.

Thank you!

But do you have numpy arrays or lists? Are all the list of the same size? This is not clear in your question if the list are all the same size is as simple as doing lst == 4, assuming lst is numpy array — Dani Mesejo
– Dani Mesejo, Commented Oct 16, 2021 at 20:20
Look at np.array(list) for the two cases. Do you see a difference? In shape, dtype. Before using np.vectorize read its docs carefully, paying attention to what it passes to your function, and its speed disclaimer. — hpaulj
– hpaulj, Commented Oct 16, 2021 at 20:27
As long as you start with nested lists, and there's a good chance that they differ in length, i bet a pure list solution will be best. numpy only has a chance of being better if np.array(list) already exists, and produces a 2d array. numpy is not magic, improving every problem. — hpaulj
– hpaulj, Commented Oct 16, 2021 at 20:28

hpaulj · Accepted Answer · 2021-10-16 22:23:20Z

Compare these times, and tell me why you are trying so hard to use np.vectorize.

In [405]: %%timeit
     ...: list = [[1,2,3],[1,4,5],[2,7,6,4]]
     ...: [i for i,row in enumerate(list) if 4 in row]
     ...: 
     ...: 
936 ns ± 5.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [406]: %%timeit
     ...: list = [[1,2,3],[1,4,5],[2,7,6,4]]
     ...: part_of = numpy.vectorize(lambda x: 4 in x)
     ...: np.where(part_of(list))
     ...: 
     ...: 
/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:2195: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  for a in args]
36.4 µs ± 96.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As for why vectorize works in one case and not the other, compare:

In [407]: np.array( [[1,2,3],[1,4,5],[2,7,6,4]])
<ipython-input-407-4be5d0a36d88>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  np.array( [[1,2,3],[1,4,5],[2,7,6,4]])

Out[407]: array([list([1, 2, 3]), 
                 list([1, 4, 5]), 
                 list([2, 7, 6, 4])], 
          dtype=object)

In [408]: np.array( [[1,2,3],[1,4,5],[2,7,6]])
Out[408]: 
array([[1, 2, 3],
       [1, 4, 5],
       [2, 7, 6]])

vectorize works with a numpy array, not the "raw" list of lists. Those are very different arrays.

Look at what a simple print function sees when given these 2 different arrays:

In [409]: f=np.vectorize(print)(_407)
[1, 2, 3]
[1, 2, 3]
[1, 4, 5]
[2, 7, 6, 4]
In [410]: f=np.vectorize(print)(_408)
1
1
2
3
1
4
5
2
7
6

Beginners often try np.vectorize thinking it will give them "numpy vectorizing" speed, but as you see it does not do that (despite the name). And it isn't nearly as easy to use as a superficial scan of its docs suggests.

Collectives™ on Stack Overflow

Numpy Vectorize, Finding Indices of subarray with Element in it

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related