1

Can anyone help to explain to me how can I do counting from 2 arrays without any iteration (e.g using numpy)?

Example: I have two numpy arrays, Origin and destiation. Origin and destination can have the same value. Let say I have 6 items in my array

origin = np.array(['LA', 'SF', 'NY', 'NY', 'LA', 'LA'])

dest = np.array(['SF', 'NY', 'NY', 'SF', 'LA', 'LA'])

The first item is from LA-SF, second SF-NY, third NY-NY, and so on.

The result that I want is

array([[1, 0, 1],
       [0, 2, 1],
       [1, 0, 0]])

where the row refers to origin, first being NY, second being LA, and third being SF, and the column refers to the destination with the same order.

Thank you!

2 Answers 2

1

You can use np.unique(,return_inverse=1) and np.add.at to do that

def comm_mtx(origin, dest, keys = None):  # keys -> np.array of strings   
    if keys.size:
        o_lbl = d_lbl = keys
        k_sort = np.argsort(keys)
        o_idx = np.searchsorted(keys, origin, sorter = k_sort)
        d_idx = np.searchsorted(keys, dest, sorter = k_sort)
        o_idx = np.arange(o_idx.size)[k_sort][o_idx]
        d_idx = np.arange(d_idx.size)[k_sort][d_idx]
    else:
        o_lbl, o_idx = np.unique(origin, return_inverse = 1)
        d_lbl, d_idx = np.unique(dest,   return_inverse = 1)
    out = np.zeros((o_lbl.size, d_lbl.size))
    np.add.at(out, (o_idx, d_idx), 1)
    if keys.size:
        return out
    else:
        return o_lbl, d_lbl, out

Depending on the sparsity of out, you may want to use a scipy.sparse.coo_matrix instead

from scipy.sparse import coo_matrix as coo
def comm_mtx(origin, dest):    
    o_lbl, o_idx = np.unique(origin, return_inverse = 1)
    d_lbl, d_idx = np.unique(dest,   return_inverse = 1)
    return o_lbl, d_lbl, coo((np.ones(origin.shape), (o_idx, d_idx)), shape = (o_lbl.size, d_lbl.size))
Sign up to request clarification or add additional context in comments.

7 Comments

This answer is wrong because OP said "where the row refers to origin, first being NY, second being LA, and third being SF, and the column refers to the destination with the same order", and np.unique does not give you this order.
Although if OP changes his mind and decides he actually doesn't need this then this answer is correct and better than mine :)
Quite right, let me see if I can come up with something better than yours :P
Please do, I'm sure there's more num-pythonic way than using a dictionary to map keys. Also, good shout with the sparse matrix!
The sparse matrix idea is really really cool. I decided to accept this answer because the order does not really matter to me. Thank you!
|
0

To achieve what you've asked, which is to have the output matrix with the rows corresponding to the keys in a specific order, you could use a dictionary to map each unique element to a row index.

origin = np.asarray(['LA', 'SF', 'NY', 'NY', 'LA', 'LA'])
dest = np.asarray(['SF', 'NY', 'NY', 'SF', 'LA', 'LA'])

matrix_map = {'NY': 0, 'LA': 1, 'SF': 2}
stacked_inputs = np.vstack((origin, dest))
remapped_inputs = np.vectorize(matrix_map.get)(stacked_inputs)

output_matrix = np.zeros((len(matrix_map), len(matrix_map)), dtype=np.int16)
np.add.at(output_matrix, (remapped_inputs[0], remapped_inputs[1]), 1)
print(output_matrix)

Which outputs;

[[1 0 1]
 [0 2 1]
 [1 0 0]]

as desired.


Alternatively if you do not wish to hard code matrix_map beforehand, you could build it programmatically as follows;

stacked_inputs = np.vstack((origin, dest))

matrix_map = {}
for element in stacked_inputs.flatten():
    matrix_map.setdefault(element, len(matrix_map))
print(matrix_map)

remapped_inputs = np.vectorize(matrix_map.get)(stacked_inputs)

This would not give you the desired order, but would allow you to use the dictionary to easily map which row / column relates to which token.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.