0

The objective is to delete rows based on multiple columns.

Say, if the array is of size Nx3, then drop any rows that not having value Column0>=Column1>=Column2. Whereas, for array of size NX6, then drop any rows that not having value Column0>=Column1>=Column2 and Column3>=Column4>=Column5. The same rule apply for array of size NxM, where M is the increment of 3.

The following code should achieve the above requirement

arr = np.meshgrid ( *[[1, 2, 3,10] for _ in range ( 12 )] )

df = pd.DataFrame ( list ( map ( np.ravel, arr ) ) ).transpose ()
df_len = len ( df.columns )
a_list = np.arange ( df_len ).reshape ( (-1, 3) )

for x in range ( len ( a_list ) ):
    mask = (df [a_list [x, 0]] >= df [a_list [x, 1]]) & (df [a_list [x, 1]] >= df [a_list [x, 2]])
    df.drop ( df.loc [~mask].index, inplace=True )

However, the above code above is not time friendly with higher dimension and longer list_no length.

May I know how to improved the above code.

1
  • do not use for loop. use NumPy vectorization Commented Oct 20, 2020 at 7:21

2 Answers 2

0

Working directly with numpy array significantly reduce the overall computation.

dimension=9
list_no=[1, 2,3,10]
arr = np.meshgrid ( *[list_no for _ in range ( dimension )] )
a = np.array(list ( map ( np.ravel, arr ) )).transpose()
num_rows, num_cols = a.shape

a_list = np.arange ( num_cols ).reshape ( (-1, 3) )
for x in range ( len ( a_list ) ):
    a=a[(a[:, a_list [x, 0]] >= a[:, a_list [x, 1]]) & (a[:, a_list [x, 1]] >= a[:, a_list [x, 2]])]
Sign up to request clarification or add additional context in comments.

Comments

-1

Here is my proposal for the NxM problem:

n=10000
m=9
df = pd.DataFrame(np.random.randint(0,n,size=(n, m)))

def condition(col):
   res = ( (col[i] >= col[i+1]) & (col[i+1]>=col[i+2]) for i in (j*3 for j in range(m//3)) )
   return not(all(res))

df['D'] = df.apply(condition, axis=1)

df.drop(df[df.D].index)

3 Comments

Hi Thanks for dropping by the the same solution provided by your is there already proposed in the original proposal.
Hi Bruno, it is the same code as mine but with different coating, I guess? The main issue here is about timing of execution. I can consider to accept your solution if the timing of your proposal is significantly better than what I initially proposed.
Yes, you're right, I'm not sure that my solution is much faster than yours. There may be some speed gain in the use of all applied to the generator in the condition. Anyway, thanks for the interesting question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.