Generating a boolean mask indexing one array into another array

Question

It's hard to explain what I'm trying to do with words so here's an example.

Let's say we have the following inputs:

In [76]: x
Out[76]: 
0    a
1    a
2    c
3    a
4    b

In [77]: z
Out[77]: ['a', 'b', 'c', 'd', 'e']

I want to get:

In [78]: ii
Out[78]: 
array([[1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0]])

ii is an array of boolean masks which can be applied to z to get back the original x.

My current solution is to write a function which converts z to a list and uses the index method to get the index of the element in z and then generate a row of zeroes except for the index where there is a one. This function gets applied to each row of x to get the desired result.

And so, what's your question? Writing this function? Surely, something like np.choose(["abcde".index(i) for i in x], "abcde") doesn't work for you? — Pierre GM
– Pierre GM, Commented Sep 12, 2012 at 6:29
Well, I get array(['a', 'a', 'c', 'a', 'b'], dtype='|S1') as a result when I run your line. What I want is the masks (lists of 5 boolean elements) for ['a', 'a', 'c', 'a', 'b']. Does this make it clearer? — Daniel
– Daniel, Commented Sep 12, 2012 at 7:17
Are you looking for a faster way or just somthing shorter to type like: np.array([[j == i for j in z] for i in x], dtype=int) — Wouter Overmeire
– Wouter Overmeire, Commented Sep 12, 2012 at 7:57

Community · Accepted Answer · 2017-05-23 12:08:02Z

1

A first possibility:

>>> choices = np.diag([1]*5)
>>> choices[[z.index(i) for i in x]]

As noted elsewhere, you can change the list comprehension [z.index(i) for i in x] by np.searchsorted(z, x)

>>> choices[np.searchsorted(z, x)]

Note that as suggested in a comment by @seberg, you should use np.eye(len(x)) instead of np.diag([1]*len(x)). The np.eye function directly gives you a 2D array with 1 on the diagonal and 0 elsewhere.

edited May 23, 2017 at 12:08

CommunityBot

11 silver badge

answered Sep 12, 2012 at 8:15

Pierre GM

20.5k3 gold badges58 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Daniel Over a year ago

The first one is what I was looking for.

seberg Over a year ago

Instead of np.diag([1]*5), rather use np.eye(5) I think.

seberg · Accepted Answer · 2012-09-12 08:47:14Z

1

This is numpy method for the case of z being sorted. You did not specifiy that... If pandas needs something differently, I don't know:

# Assuming z is sorted.
indices = np.searchsorted(z, x)

Now I really don't know why you want a boolean mask, these indices can be applied to z to give back x already and are more compact.

z[indices] == x # if z included all x.

answered Sep 12, 2012 at 8:47

seberg

9,0352 gold badges34 silver badges30 bronze badges

3 Comments

Daniel Over a year ago

Unfortunately they are not sorted, my example is misleading. Also I need the masks because I multiply them by some probability matrix after that: kaggle.com/c/predict-closed-questions-on-stack-overflow/forums/…

seberg Over a year ago

If they are unique, sort them yourself first, if you care about speed. As to creation of the boolean array, I would suggest something like a = np.zeros((...,...), dtype=bool); a[np.ix_[np.arange(...), z]] = 1 maybe. But doesn't matter much.

Daniel Over a year ago

If I sort them then I'll have to sort the columns of all the other arrays and matrices that I have to match. Not sure if it's worth it.

Wes McKinney · Accepted Answer · 2012-10-24 16:23:56Z

1

Surprised no one mentioned theouter method of numpy.equal:

In [51]: np.equal.outer(s, z)
Out[51]: 
array([[ True, False, False, False, False],
       [ True, False, False, False, False],
       [False, False,  True, False, False],
       [ True, False, False, False, False],
       [False,  True, False, False, False]], dtype=bool)

In [52]: np.equal.outer(s, z).astype(int)
Out[52]: 
array([[1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0]])

answered Oct 24, 2012 at 16:23

Wes McKinney

106k32 gold badges146 silver badges109 bronze badges

Collectives™ on Stack Overflow

Generating a boolean mask indexing one array into another array

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related