0

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.

Suppose I have the following matrix.

M = array([[1,2], [3,4]])

The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):

for row_idx, row in enumerate(M):
    print "row_idx", row_idx, "row", row 
    for col_idx, element in enumerate(row):
        print "col_idx", col_idx, "element", element
        # update the matrix M: square each element
        M[row_idx, col_idx] = element ** 2

However, in my case I want to walk through each column efficiently, since I have a very big matrix.

I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:

curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
    result = 0
    while (curr_row < numb_rows):
        # If different from 0
        if (M[curr_row][curr_col] != 0):
           result += 1
        curr_row += 1
    .... using result value ...
    curr_col += 1
    curr_row = 0

Thanks in advance!

3
  • You need to clean up the indentation. Commented Feb 17, 2015 at 22:26
  • Re your first code block, it can be substituted by the single statement M=M*M Commented Feb 17, 2015 at 22:32
  • Indentation corrected, thank's Commented Feb 17, 2015 at 22:44

1 Answer 1

3

In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.

To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,

M = M*M

when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.

That said, I'll try to get a bit closer to your problem... If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.

Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.

import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]

Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but

  1. if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even

    print a%2==0
    # [[False  True]
    #  [False  True]]
    
  2. False is zero and True is one, at least when we sum it...

    print np.sum(a%2==0) # 2
    

    or, if you want to sum over a column, i.e., the index that changes is the 0-th

    print np.sum(a%2==0, axis=0) # [0 2]
    

    or sum across a row

    print np.sum(a%2==0, axis=1) # [1 1]
    

To summarize, for your particular use case

by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...

# if you need the grand total, use sum again
total = np.sum(by_col)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.