Convert numpy array with values into array with frequency for each observation in each row

Question

I have a numpy array as follows:

array = np.random.randint(6, size=(50, 400))

This array has the cluster that each value belongs to, with each row representing a sample and each column representing a feature, but I would like to create a 5 dimensional array with the frequency of each cluster (in each sample, represented as a row in this matrix).

However, in the frequency calculation, I want to ignore 0, meaning that the frequency of all values except 0 (1-5) should add to 1.

Essentially what I want is a array with each row being a cluster (1-5) in this case, and each row still contains a single sample.

How can this be done?

Edit:

small input:

input = np.random.randint(6, size=(2, 5))

array([[0, 4, 2, 3, 0],
       [5, 5, 2, 5, 3]])

output:

1    2    3    4    5

0   .33  .33  .33   0
0   .2   .2    0   .6

Where 1-5 are the row names, and the bottom two rows are the desired output in a numpy array.

When you say "5 dimensional array" do you mean an array of shape (5,)? — chthonicdaemon
– chthonicdaemon, Commented May 28, 2018 at 13:41
I just added an example input and output. I hope that helps. — Jack Arnestad
– Jack Arnestad, Commented May 28, 2018 at 13:46

chthonicdaemon · Accepted Answer · 2018-05-28 15:47:17Z

4

This is a simple application of bincount. Does this do what you want?

def freqs(x):
    counts = np.bincount(x, minlength=6)[1:]
    return counts/counts.sum()

frequencies = np.apply_along_axis(freqs, axis=1, arr=array)

If you were wondering about the speed implications of apply_along_axis, this method using tricky indexing is marginally slower in my tests:

counts = (array[:, :, None] == values[None, None, :]).sum(axis=1)
frequencies2 = counts/counts.sum(axis=1)[:, None]

edited May 28, 2018 at 15:47

answered May 28, 2018 at 13:57

chthonicdaemon

19.9k2 gold badges55 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

filippo Over a year ago

shouldn't it be axis=1 and no .T?

chthonicdaemon Over a year ago

@filippo indeed. Thanks.

Collectives™ on Stack Overflow

Convert numpy array with values into array with frequency for each observation in each row

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related