How to effeciently create conditional columns arrays using Numpy?

Question

The objective is to create an array but by fulfilling the condition of (x=>y) and (y=>z).

One naive way but does the job is by using a nested for loop as shown below

tot_length=200
steps=0.1
start_val=0.0
list_no =np.arange(start_val, tot_length, steps)

a=np.zeros(shape=(1,3))
for x in list_no:
    for y in list_no:
        for z in list_no:
            if (x>=y) & (y>=z):
                a=np.append(a, [[x, y, z]], axis=0)

While no memory requirement issue was thrown, but the execution time is significantly slow.

Other approach that can be considered is by using the code code below. Yet the proposal only able to work flawlessly as long as tot_length is less than 100. More than that, memory issue arise as reported here

tot_length=200
steps=0.1
start_val=0.0
list_no =np.arange(start_val, tot_length, steps)
arr = np.meshgrid ( *[list_no for _ in range ( 3 )] )
a = np.array(list ( map ( np.ravel, arr ) )).transpose()
num_rows, num_cols = a.shape

a_list = np.arange ( num_cols ).reshape ( (-1, 3) )
for x in range ( len ( a_list ) ):
    a=a[(a[:, a_list [x, 0]] >= a[:, a_list [x, 1]]) & (a[:, a_list [x, 1]] >= a[:, a_list [x, 2]])]

Appreciate for any suggestion that can balance the overall execution time as well as memory issue. I also welcome for any suggestion using Pandas if that should make thing work

To determine whether the proposed output produced the intended output, the following parameter

tot_length=3
steps=1
start_val=1

Should produce the output

For tot_length=200, you are looking at about 30GB memory allocation for a, which is not small. — Quang Hoang
– Quang Hoang, Commented Oct 20, 2020 at 17:27

Kate Melnykova · Accepted Answer · 2020-10-20 17:31:45Z

2

tot_length = 200
steps = 0.1
list_no = np.arange(0.0, tot_length, steps)

a = list()
for x in list_no:
    for y in list_no:
        if y > x:
            break

        for z in list_no:
            if z > y:
                break

            a.append([x, y, z])

a = np.array(a)
# if needed, a.transpose()

edited Oct 20, 2020 at 17:31

answered Oct 20, 2020 at 17:30

Kate Melnykova

1,8731 gold badge7 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Quang Hoang Over a year ago

I don't see how this is different from OP's solution.

Eric Over a year ago

This avoids calling np.append, which is slower than using a list and converting to array at the end

Eric · Accepted Answer · 2020-10-20 17:31:05Z

1

Does something like this work?

tot_length=200
steps=0.1
list_no = np.arange(0.0, tot_length, steps)
x, y, z = np.meshgrid(*[list_no for _ in range(3)], sparse=True)
a = ((x>=y) & (y>=z)).nonzero()

This will still use 8GB of memory for the intermediate array of booleans, but avoids repeated calls to np.append which are slow.

answered Oct 20, 2020 at 17:31

Eric

98.1k54 gold badges257 silver badges389 bronze badges

1 Comment

rpb Over a year ago

Hi Eric, I just notice, all the value in a is rounded. Is this expected? Is it because nonzero() return non floating type? For example if I set tot_length=0.3 , steps=0.1, start_val=0.1 , your proposed solution will return a round value instead of in decimal value.

Collectives™ on Stack Overflow

How to effeciently create conditional columns arrays using Numpy?

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related