1

I have a 2D Python array, from which I would like to remove certain columns, but I don't know how many I would like to remove until the code runs.

I want to loop over the columns in the original array, and if the sum of the rows in any one column is about a certain value I want to remove the whole column.

I started to do this the following way:

for i in range(original_number_of_columns)
    if sum(original_array[:,i]) < certain_value:
        new_array[:,new_index] = original_array[:,i]
        new_index+=1

But then I realised that I was going to have to define new_array first, and tell Python what size it is. But I don't know what size it is going to be beforehand.

I have got around it before by firstly looping over the columns to find out how many I will lose, then defining the new_array, and then lastly running the loop above - but obviously there will be much more efficient ways to do such things!

Thank you.

1
  • 1
    You might be able to just collapse the original array, but you probably need to work backwards, removing the farthest columns first. Commented Jul 22, 2013 at 16:27

3 Answers 3

3

You can use the following:

import numpy as np

a = np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]
)

print a.compress(a.sum(0) > 15, 1)

[[3]
 [6]
 [9]]
Sign up to request clarification or add additional context in comments.

Comments

3

without numpy

my_2d_table = [[...],[...],...]
only_cols_that_sum_lt_x = [col for col in zip(*my_2d_table) if sum(col) < some_threshold]
new_table = map(list,zip(*only_cols_that_sum_lt_x))

with numpy

a = np.array(my_2d_table)
a[:,np.sum(a,0) < some_target]

2 Comments

The question was tagged with numpy, so there's no need for a non-numpy solution. Also, I believe a.sum(0) looks nicer than np.sum(a,0), but that's just me. Regardless, nice usage of advanced indexing, I forgot you could use boolean arrays for that too.
meh ... i like np.sum because its more explicit ... I would probably actually use np.sum(a,axis=0)
2

I suggest using numpy.compress.

>>> import numpy as np
>>> a = np.array([[1, 2, 3], [1, -3, 2], [4, 5, 7]])
>>> a
array([[ 1,  2,  3],
       [ 1, -3,  2],
       [ 4,  5,  7]])
>>> a.sum(axis=0)  # sums each column
array([ 6,  4, 12])
>>> a.sum(0) < 5
array([ False, True,  False], dtype=bool)
>>> a.compress(a.sum(0) < 5, axis=1)  # applies the condition to the elements of each row so that only those elements in the rows whose column indices correspond to True values in the condition array will be kept
array([[ 2],
       [-3],
       [ 5]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.