How to delete rows based on multiple columns condition from Numpy array?

Question

The objective is to delete rows based on multiple columns.

Say, if the array is of size Nx3, then drop any rows that not having value Column0>=Column1>=Column2. Whereas, for array of size NX6, then drop any rows that not having value Column0>=Column1>=Column2 and Column3>=Column4>=Column5. The same rule apply for array of size NxM, where M is the increment of 3.

The following code should achieve the above requirement

arr = np.meshgrid ( *[[1, 2, 3,10] for _ in range ( 12 )] )

df = pd.DataFrame ( list ( map ( np.ravel, arr ) ) ).transpose ()
df_len = len ( df.columns )
a_list = np.arange ( df_len ).reshape ( (-1, 3) )

for x in range ( len ( a_list ) ):
    mask = (df [a_list [x, 0]] >= df [a_list [x, 1]]) & (df [a_list [x, 1]] >= df [a_list [x, 2]])
    df.drop ( df.loc [~mask].index, inplace=True )

However, the above code above is not time friendly with higher dimension and longer list_no length.

May I know how to improved the above code.

do not use for loop. use NumPy vectorization

Kevin Choi
– Kevin Choi

2020-10-20 07:21:17 +00:00
Commented Oct 20, 2020 at 7:21 — Kevin Choi
– Kevin Choi, Commented Oct 20, 2020 at 7:21

rpb · Accepted Answer · 2020-10-20 13:44:17Z

0

Working directly with numpy array significantly reduce the overall computation.

dimension=9
list_no=[1, 2,3,10]
arr = np.meshgrid ( *[list_no for _ in range ( dimension )] )
a = np.array(list ( map ( np.ravel, arr ) )).transpose()
num_rows, num_cols = a.shape

a_list = np.arange ( num_cols ).reshape ( (-1, 3) )
for x in range ( len ( a_list ) ):
    a=a[(a[:, a_list [x, 0]] >= a[:, a_list [x, 1]]) & (a[:, a_list [x, 1]] >= a[:, a_list [x, 2]])]

answered Oct 20, 2020 at 13:44

rpb

3,3073 gold badges32 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BrunoO · Accepted Answer · 2020-10-20 13:04:39Z

-1

Here is my proposal for the NxM problem:

n=10000
m=9
df = pd.DataFrame(np.random.randint(0,n,size=(n, m)))

def condition(col):
   res = ( (col[i] >= col[i+1]) & (col[i+1]>=col[i+2]) for i in (j*3 for j in range(m//3)) )
   return not(all(res))

df['D'] = df.apply(condition, axis=1)

df.drop(df[df.D].index)

edited Oct 20, 2020 at 13:04

answered Oct 20, 2020 at 8:09

BrunoO

735 bronze badges

3 Comments

rpb Over a year ago

Hi Thanks for dropping by the the same solution provided by your is there already proposed in the original proposal.

rpb Over a year ago

Hi Bruno, it is the same code as mine but with different coating, I guess? The main issue here is about timing of execution. I can consider to accept your solution if the timing of your proposal is significantly better than what I initially proposed.

BrunoO Over a year ago

Yes, you're right, I'm not sure that my solution is much faster than yours. There may be some speed gain in the use of all applied to the generator in the condition. Anyway, thanks for the interesting question.

Collectives™ on Stack Overflow

How to delete rows based on multiple columns condition from Numpy array?

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related