2D numpy array: Remove lines that contain only empty tuples

Question

I have a 2d numpy array that contains tuple with two elements: an int and an str.

An example on how the 2d array may look:

matrix = np.array(
[[(1, 'foo'), (), (4, 'bar')],
 [(),(),()],
 [(1, 'foo'), (), (3, 'foobar')],
 [(),(),()]], 
dtype=object)

I'm looking to remove the lines that contains only empty tuples.

I tried the following code:

matrix = matrix[~np.all(matrix == (), axis=1)]

but it gave me the following error:

numpy.AxisError: axis 1 is out of bounds for array of dimension 0

The above code works for a 2d array that contains only integers with a condition like that in the all function: matrix == 0. It correctly removes all lines that contains only zeros. So is there a way to do that but instead of removing lines with only zeros, to remove lines with only empty tuples?

Should you really use numpy for this? you won't benefit from vectorization. A python list would be more appropriate IMO… — mozway
– mozway, Commented Jun 29, 2022 at 8:30
Test all pieces of that expression to determine exactly what is producing that error. — hpaulj
– hpaulj, Commented Jun 29, 2022 at 8:43

Alexandre Novius · Accepted Answer · 2022-06-29 09:14:16Z

2

The problem here is that tuples are Sequence Types. When you try to apply matrix == (), Numpy makes a comparison of matrices, and so matrix == () return a simple false.

This explains the error axis 1 is out of bounds for array of dimension 0, since false is of dimension 0.

A workaround is to test differently if a tuple is empty, for example by vectorizing the len function:

>>> vect_len = np.vectorize(len)

Then, we can do:

>>> matrix = matrix[~np.all(vect_len(matrix) == 0, axis=1)]
[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]

Or even more simple:

>>> matrix = matrix[np.any(vect_len(matrix), axis=1)]
[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]

answered Jun 29, 2022 at 9:14

Alexandre Novius

1828 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Cardstdani · Accepted Answer · 2022-06-29 09:01:25Z

0

You can try to traverse the array with a for loop and check if a sublist is made only with empty tuples with all() function:

import numpy as np

matrix = np.array([[(1, 'foo'), (), (4, 'bar')], [(), (), ()], [(1, 'foo'), (), (3, 'foobar')], [(), (), ()]])

for i in range(len(matrix)):
    try:
        if all(x == () for x in matrix[i]):
            matrix = np.delete(matrix, i, axis=0)
    except:
        pass
print(matrix)

Output:

[[(1, 'foo') () (4, 'bar')]
 [(1, 'foo') () (3, 'foobar')]]

edited Jun 29, 2022 at 9:01

answered Jun 29, 2022 at 8:44

Cardstdani

5,2533 gold badges14 silver badges37 bronze badges

Comments

Christoph Rackwitz · Accepted Answer · 2022-06-29 09:22:33Z

0

As suggested in the comments, do not use numpy here. Numpy is for numbers. You don't have numbers. Numpy arrays may be able to hold object but there's no benefit here, and you run into problems as you've seen.

You can just use a "list comprehension" and the all() function to filter your data.

lines = [
 [(1, 'foo'), (), (4, 'bar')],
 [(),(),()],
 [(1, 'foo'), (), (3, 'foobar')],
 [(),(),()]]

lines = [ line for line in lines if not all(elem == () for elem in line) ]

edited Jun 29, 2022 at 9:22

answered Jun 29, 2022 at 9:05

Christoph Rackwitz

16.4k5 gold badges42 silver badges56 bronze badges

4 Comments

Cardstdani Over a year ago

Maybe OP is forced to use Numpy, its use seems not usual here.

Christoph Rackwitz Over a year ago

We'll let OP explain.

Valus_Paulus Over a year ago

Ok thank your for your answer. I'm a beginner in python and I thought that you'll always have more reasons to use numpy than vanilla python list since each time I was looking for an operation to do on a array, I'll often get a numpy answer. EDIT: Just saw the others answers. No I'm not forced to use numpy, like I said I'm a beginner in Python and so I thought that for everything as far as arrays are concerned, it was better to use numpy array instead of vanilla python array.

Christoph Rackwitz Over a year ago

yeah don't worry, numpy is powerful, but it can't speed up anything with this kind of data (anything but only-numbers). sticking arbitrary objects into a numpy array can be convenient for the indexing and slicing though, if a problem needs that.

Collectives™ on Stack Overflow

2D numpy array: Remove lines that contain only empty tuples

3 Answers 3

Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related