Vectorized Counting 2 Dimensional Numpy Array

Question

Can anyone help to explain to me how can I do counting from 2 arrays without any iteration (e.g using numpy)?

Example: I have two numpy arrays, Origin and destiation. Origin and destination can have the same value. Let say I have 6 items in my array

origin = np.array(['LA', 'SF', 'NY', 'NY', 'LA', 'LA'])

dest = np.array(['SF', 'NY', 'NY', 'SF', 'LA', 'LA'])

The first item is from LA-SF, second SF-NY, third NY-NY, and so on.

The result that I want is

array([[1, 0, 1],
       [0, 2, 1],
       [1, 0, 0]])

where the row refers to origin, first being NY, second being LA, and third being SF, and the column refers to the destination with the same order.

Thank you!

Daniel F · Accepted Answer · 2017-08-02 11:15:48Z

1

You can use np.unique(,return_inverse=1) and np.add.at to do that

def comm_mtx(origin, dest, keys = None):  # keys -> np.array of strings   
    if keys.size:
        o_lbl = d_lbl = keys
        k_sort = np.argsort(keys)
        o_idx = np.searchsorted(keys, origin, sorter = k_sort)
        d_idx = np.searchsorted(keys, dest, sorter = k_sort)
        o_idx = np.arange(o_idx.size)[k_sort][o_idx]
        d_idx = np.arange(d_idx.size)[k_sort][d_idx]
    else:
        o_lbl, o_idx = np.unique(origin, return_inverse = 1)
        d_lbl, d_idx = np.unique(dest,   return_inverse = 1)
    out = np.zeros((o_lbl.size, d_lbl.size))
    np.add.at(out, (o_idx, d_idx), 1)
    if keys.size:
        return out
    else:
        return o_lbl, d_lbl, out

Depending on the sparsity of out, you may want to use a scipy.sparse.coo_matrix instead

from scipy.sparse import coo_matrix as coo
def comm_mtx(origin, dest):    
    o_lbl, o_idx = np.unique(origin, return_inverse = 1)
    d_lbl, d_idx = np.unique(dest,   return_inverse = 1)
    return o_lbl, d_lbl, coo((np.ones(origin.shape), (o_idx, d_idx)), shape = (o_lbl.size, d_lbl.size))

edited Aug 2, 2017 at 11:15

answered Aug 2, 2017 at 10:16

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Tom Wyllie Over a year ago

This answer is wrong because OP said "where the row refers to origin, first being NY, second being LA, and third being SF, and the column refers to the destination with the same order", and np.unique does not give you this order.

Tom Wyllie Over a year ago

Although if OP changes his mind and decides he actually doesn't need this then this answer is correct and better than mine :)

Daniel F Over a year ago

Quite right, let me see if I can come up with something better than yours :P

Tom Wyllie Over a year ago

Please do, I'm sure there's more num-pythonic way than using a dictionary to map keys. Also, good shout with the sparse matrix!

Jack Over a year ago

The sparse matrix idea is really really cool. I decided to accept this answer because the order does not really matter to me. Thank you!

|

Tom Wyllie · Accepted Answer · 2017-08-02 10:30:29Z

To achieve what you've asked, which is to have the output matrix with the rows corresponding to the keys in a specific order, you could use a dictionary to map each unique element to a row index.

origin = np.asarray(['LA', 'SF', 'NY', 'NY', 'LA', 'LA'])
dest = np.asarray(['SF', 'NY', 'NY', 'SF', 'LA', 'LA'])

matrix_map = {'NY': 0, 'LA': 1, 'SF': 2}
stacked_inputs = np.vstack((origin, dest))
remapped_inputs = np.vectorize(matrix_map.get)(stacked_inputs)

output_matrix = np.zeros((len(matrix_map), len(matrix_map)), dtype=np.int16)
np.add.at(output_matrix, (remapped_inputs[0], remapped_inputs[1]), 1)
print(output_matrix)

Which outputs;

[[1 0 1]
 [0 2 1]
 [1 0 0]]

as desired.

Alternatively if you do not wish to hard code matrix_map beforehand, you could build it programmatically as follows;

stacked_inputs = np.vstack((origin, dest))

matrix_map = {}
for element in stacked_inputs.flatten():
    matrix_map.setdefault(element, len(matrix_map))
print(matrix_map)

remapped_inputs = np.vectorize(matrix_map.get)(stacked_inputs)

This would not give you the desired order, but would allow you to use the dictionary to easily map which row / column relates to which token.

Collectives™ on Stack Overflow

Vectorized Counting 2 Dimensional Numpy Array

2 Answers 2

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related