Walk through each column in a numpy matrix efficiently in Python

Question

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.

Suppose I have the following matrix.

M = array([[1,2], [3,4]])

The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):

for row_idx, row in enumerate(M):
    print "row_idx", row_idx, "row", row 
    for col_idx, element in enumerate(row):
        print "col_idx", col_idx, "element", element
        # update the matrix M: square each element
        M[row_idx, col_idx] = element ** 2

However, in my case I want to walk through each column efficiently, since I have a very big matrix.

I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:

curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
    result = 0
    while (curr_row < numb_rows):
        # If different from 0
        if (M[curr_row][curr_col] != 0):
           result += 1
        curr_row += 1
    .... using result value ...
    curr_col += 1
    curr_row = 0

Thanks in advance!

Re your first code block, it can be substituted by the single statement M=M*M — gboffi
– gboffi, Commented Feb 17, 2015 at 22:32

gboffi · Accepted Answer · 2015-02-17 23:04:34Z

In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.

To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,

M = M*M

when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.

That said, I'll try to get a bit closer to your problem... If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.

Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.

import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]

Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but

if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even
```
print a%2==0
# [[False  True]
#  [False  True]]
```
False is zero and True is one, at least when we sum it...
```
print np.sum(a%2==0) # 2
```
or, if you want to sum over a column, i.e., the index that changes is the 0-th
```
print np.sum(a%2==0, axis=0) # [0 2]
```
or sum across a row
```
print np.sum(a%2==0, axis=1) # [1 1]
```

To summarize, for your particular use case

by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...

# if you need the grand total, use sum again
total = np.sum(by_col)

Collectives™ on Stack Overflow

Walk through each column in a numpy matrix efficiently in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related