0

I have a 2d numpy array that contains tuple with two elements: an int and an str.

An example on how the 2d array may look:

matrix = np.array(
[[(1, 'foo'), (), (4, 'bar')],
 [(),(),()],
 [(1, 'foo'), (), (3, 'foobar')],
 [(),(),()]], 
dtype=object)

I'm looking to remove the lines that contains only empty tuples.

I tried the following code:

matrix = matrix[~np.all(matrix == (), axis=1)]

but it gave me the following error:

numpy.AxisError: axis 1 is out of bounds for array of dimension 0

The above code works for a 2d array that contains only integers with a condition like that in the all function: matrix == 0. It correctly removes all lines that contains only zeros. So is there a way to do that but instead of removing lines with only zeros, to remove lines with only empty tuples?

2
  • 1
    Should you really use numpy for this? you won't benefit from vectorization. A python list would be more appropriate IMO… Commented Jun 29, 2022 at 8:30
  • 1
    Test all pieces of that expression to determine exactly what is producing that error. Commented Jun 29, 2022 at 8:43

3 Answers 3

2

The problem here is that tuples are Sequence Types. When you try to apply matrix == (), Numpy makes a comparison of matrices, and so matrix == () return a simple false.

This explains the error axis 1 is out of bounds for array of dimension 0, since false is of dimension 0.

A workaround is to test differently if a tuple is empty, for example by vectorizing the len function:

>>> vect_len = np.vectorize(len)

Then, we can do:

>>> matrix = matrix[~np.all(vect_len(matrix) == 0, axis=1)]
[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]

Or even more simple:

>>> matrix = matrix[np.any(vect_len(matrix), axis=1)]
[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]
Sign up to request clarification or add additional context in comments.

Comments

0

You can try to traverse the array with a for loop and check if a sublist is made only with empty tuples with all() function:

import numpy as np

matrix = np.array([[(1, 'foo'), (), (4, 'bar')], [(), (), ()], [(1, 'foo'), (), (3, 'foobar')], [(), (), ()]])

for i in range(len(matrix)):
    try:
        if all(x == () for x in matrix[i]):
            matrix = np.delete(matrix, i, axis=0)
    except:
        pass
print(matrix)

Output:

[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]

Comments

0

As suggested in the comments, do not use numpy here. Numpy is for numbers. You don't have numbers. Numpy arrays may be able to hold object but there's no benefit here, and you run into problems as you've seen.

You can just use a "list comprehension" and the all() function to filter your data.

lines = [
 [(1, 'foo'), (), (4, 'bar')],
 [(),(),()],
 [(1, 'foo'), (), (3, 'foobar')],
 [(),(),()]]

lines = [ line for line in lines if not all(elem == () for elem in line) ]

4 Comments

Maybe OP is forced to use Numpy, its use seems not usual here.
We'll let OP explain.
Ok thank your for your answer. I'm a beginner in python and I thought that you'll always have more reasons to use numpy than vanilla python list since each time I was looking for an operation to do on a array, I'll often get a numpy answer. EDIT: Just saw the others answers. No I'm not forced to use numpy, like I said I'm a beginner in Python and so I thought that for everything as far as arrays are concerned, it was better to use numpy array instead of vanilla python array.
yeah don't worry, numpy is powerful, but it can't speed up anything with this kind of data (anything but only-numbers). sticking arbitrary objects into a numpy array can be convenient for the indexing and slicing though, if a problem needs that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.