0

I have a numpy 2D array of arrays:

samples = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])

I need to count how many times an array is inside of the array occurs above like:

counts = [[1,2,3]:2, [2,3,4]:3, [4,5,6]:1]

I'm not sure how this can get counted or listed out the way I have above to know which array and counts are connected to each other, any help is appreciated. Thank you!

3 Answers 3

2

Everything you need is directly in numpy:

import numpy as np

a = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])

print(np.unique(a, axis=0, return_counts=True))

Result:

(array([[1, 2, 3],
       [2, 3, 4],
       [4, 5, 6]]), array([2, 3, 1], dtype=int64))

The result is a tuple of an array with the unique rows, and an array with the counts of those rows.

If you need to go through them pairwise:

unique_rows, counts = np.unique(a, axis=0, return_counts=True)

for row, c in zip(unique_rows, counts):
   print(row, c)

Result:

[1 2 3] 2
[2 3 4] 3
[4 5 6] 1
Sign up to request clarification or add additional context in comments.

Comments

0

Here's a method of doing without using much of the numpy library:

import numpy as np
samples = np.array([[1,2,3], [2,3,4], [4,5,6], [1,2,3], [2,3,4], [2,3,4]])

result = {}

for row in samples:
    inDictionary = False
    for check in range(len(result)):
        if np.all(result[str(check)][0] == row):
            result[str(check)][1]+= 1
            inDictionary = True
        else:
            pass
    if inDictionary == False:
        result[str(len(result))] = [row, 1]


print("------------------")
print(result)

This method creates a dictionary called result and then loops through the various nested lists in samples and checks if they are already in the dictionary. If they are the count of how many times it has appeared is increased by 1. Otherwise, it creates a new entry for that array. Now the counts and values that have been saved can be accessed using result["index"] for the index you want and result["index"][0] - for the array value & result["index"][1] - for the number of times it appeared.

2 Comments

Given that OP is already using numpy, wouldn't you agree that using the strengths of a very fast library is actually better than writing a more Python-based solution?
yeah, that's true. I just wanted to try and give an option without it.
0

There is a relatively fast method of Python in compare with other Python (no numpy) solutions:

from collections import Counter
>>> Counter(map(tuple, samples.tolist())) # convert to dict if you need it
Counter({(1, 2, 3): 2, (2, 3, 4): 3, (4, 5, 6): 1})

Python does it quite fast too because operations of tuple indexing are optimised pretty good

import benchit
%matplotlib inline
benchit.setparams(rep=3)

sizes = [3, 10, 30, 100, 300, 900, 3000, 9000, 30000, 90000, 300000, 900000, 3000000]
arr = np.random.randint(0,10, size=(sizes[-1], 3)).astype(int)


def count_python(samples):
    return Counter(map(tuple, samples.tolist()))
    
def count_numpy(samples):
    return np.unique(samples, axis=0, return_counts=True)

fns = [count_python, count_numpy]
in_ = {s: (arr[:s],) for s in sizes}
t = benchit.timings(fns, in_, multivar=True, input_name='Number of items')
t.plot(logx=True, figsize=(12, 6), fontsize=14)

Note that arr.tolist() consumes about 0.8sec/3M of Python computing time.

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.