1

It's hard to explain what I'm trying to do with words so here's an example.

Let's say we have the following inputs:

In [76]: x
Out[76]: 
0    a
1    a
2    c
3    a
4    b

In [77]: z
Out[77]: ['a', 'b', 'c', 'd', 'e']

I want to get:

In [78]: ii
Out[78]: 
array([[1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0]])

ii is an array of boolean masks which can be applied to z to get back the original x.

My current solution is to write a function which converts z to a list and uses the index method to get the index of the element in z and then generate a row of zeroes except for the index where there is a one. This function gets applied to each row of x to get the desired result.

4
  • And so, what's your question? Writing this function? Surely, something like np.choose(["abcde".index(i) for i in x], "abcde") doesn't work for you? Commented Sep 12, 2012 at 6:29
  • Well, I get array(['a', 'a', 'c', 'a', 'b'], dtype='|S1') as a result when I run your line. What I want is the masks (lists of 5 boolean elements) for ['a', 'a', 'c', 'a', 'b']. Does this make it clearer? Commented Sep 12, 2012 at 7:17
  • 1
    Are you looking for a faster way or just somthing shorter to type like: np.array([[j == i for j in z] for i in x], dtype=int) Commented Sep 12, 2012 at 7:57
  • @WouterOvermeire looking for both ideally Commented Sep 12, 2012 at 17:10

3 Answers 3

1

A first possibility:

>>> choices = np.diag([1]*5)
>>> choices[[z.index(i) for i in x]]

As noted elsewhere, you can change the list comprehension [z.index(i) for i in x] by np.searchsorted(z, x)

>>> choices[np.searchsorted(z, x)]

Note that as suggested in a comment by @seberg, you should use np.eye(len(x)) instead of np.diag([1]*len(x)). The np.eye function directly gives you a 2D array with 1 on the diagonal and 0 elsewhere.

Sign up to request clarification or add additional context in comments.

2 Comments

The first one is what I was looking for.
Instead of np.diag([1]*5), rather use np.eye(5) I think.
1

This is numpy method for the case of z being sorted. You did not specifiy that... If pandas needs something differently, I don't know:

# Assuming z is sorted.
indices = np.searchsorted(z, x)

Now I really don't know why you want a boolean mask, these indices can be applied to z to give back x already and are more compact.

z[indices] == x # if z included all x.

3 Comments

Unfortunately they are not sorted, my example is misleading. Also I need the masks because I multiply them by some probability matrix after that: kaggle.com/c/predict-closed-questions-on-stack-overflow/forums/…
If they are unique, sort them yourself first, if you care about speed. As to creation of the boolean array, I would suggest something like a = np.zeros((...,...), dtype=bool); a[np.ix_[np.arange(...), z]] = 1 maybe. But doesn't matter much.
If I sort them then I'll have to sort the columns of all the other arrays and matrices that I have to match. Not sure if it's worth it.
1

Surprised no one mentioned theouter method of numpy.equal:

In [51]: np.equal.outer(s, z)
Out[51]: 
array([[ True, False, False, False, False],
       [ True, False, False, False, False],
       [False, False,  True, False, False],
       [ True, False, False, False, False],
       [False,  True, False, False, False]], dtype=bool)

In [52]: np.equal.outer(s, z).astype(int)
Out[52]: 
array([[1, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.