Convert array of indices to one-hot encoded array in NumPy

Question

Given a 1D array of indices:

a = array([1, 0, 3])

I want to one-hot encode this as a 2D array:

b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

Mateen Ulhaq · Accepted Answer · 2022-08-06 21:24:33Z

548

Create a zeroed array b with enough columns, i.e. a.max() + 1.
Then, for each row i, set the a[i]th column to 1.

>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max() + 1))
>>> b[np.arange(a.size), a] = 1

>>> b
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

edited Aug 6, 2022 at 21:24

Mateen Ulhaq

27.8k21 gold badges121 silver badges155 bronze badges

answered Apr 23, 2015 at 18:30

YXD

32.6k15 gold badges79 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mohammad Moghimi Over a year ago

@JamesAtwood it depends on the application but I'd make the max a parameter and not calculate it from the data.

A.D Over a year ago

what if 'a' was 2d? and you want a 3-d one-hot matrix?

N. McA. Over a year ago

Can anyone point to an explanation of why this works, but the slice with [:, a] does not?

cgnorthcutt Over a year ago

@ A.D. Solution for the 2d -> 3d case: stackoverflow.com/questions/36960320/…

safetyduck Over a year ago

You can also use scipy.sparse.

|

K3---rnc · Accepted Answer · 2025-10-11 13:11:44Z

281

Simply project indexes on the corresponding identity matrix:

>>> indexes = [1, 0, 3]
>>> n_values = np.max(indexes) + 1
>>> np.eye(n_values)[indexes]
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

edited Oct 11 at 13:11

answered May 19, 2016 at 12:35

K3---rnc

7,0993 gold badges34 silver badges50 bronze badges

6 Comments

Isaías Over a year ago

This solution is the only one useful for an input N-D matrix to one-hot N+1D matrix. Example: input_matrix=np.asarray([[0,1,1] , [1,1,2]]) ; np.eye(3)[input_matrix] # output 3D tensor

Alex Over a year ago

+1 because this should be preferred over the accepted solution. For a more general solution though, values should be a Numpy array rather than a Python list, then it works in all dimensions, not only in 1D.

NightElfik Over a year ago

Note that taking np.max(values) + 1 as number of buckets might not be desirable if your data set is say randomly sampled and just by chance it may not contain max value. Number of buckets should be rather a parameter and assertion/check can be in place to check that each value is within 0 (incl) and buckets count (excl).

cecconeurale Over a year ago

To me this solution is the best and can be easily generalized to any tensor: def one_hot(x, depth=10): return np.eye(depth)[x]. Note that giving the tensor x as index returns a tensor of x.shape eye rows.

avivr Over a year ago

Easy way to "understand" this solution and why it works for N-dims (without reading numpy docs): at each location in the original matrix (values), we have an integer k, and we "put" the 1-hot vector eye(n)[k] in that location. This adds a dimension because we're "putting" a vector in the location of a scalar in the original matrix.

|

Berriel · Accepted Answer · 2018-06-15 15:57:10Z

59

In case you are using keras, there is a built in utility for that:

from keras.utils.np_utils import to_categorical   

categorical_labels = to_categorical(int_labels, num_classes=3)

And it does pretty much the same as @YXD's answer (see source-code).

edited Jun 15, 2018 at 15:57

Berriel

13.8k4 gold badges51 silver badges73 bronze badges

answered Nov 27, 2017 at 11:13

Jodo

4,7938 gold badges41 silver badges51 bronze badges

Comments

Augustin · Accepted Answer · 2018-08-20 11:10:49Z

56

Here is what I find useful:

def one_hot(a, num_classes):
  return np.squeeze(np.eye(num_classes)[a.reshape(-1)])

Here num_classes stands for number of classes you have. So if you have a vector with shape of (10000,) this function transforms it to (10000,C). Note that a is zero-indexed, i.e. one_hot(np.array([0, 1]), 2) will give [[1, 0], [0, 1]].

Exactly what you wanted to have I believe.

PS: the source is Sequence models - deeplearning.ai

edited Aug 20, 2018 at 11:10

Augustin

2,6421 gold badge25 silver badges25 bronze badges

answered Mar 11, 2018 at 7:41

D.Samchuk

1,2909 silver badges9 bronze badges

1 Comment

Anu Over a year ago

also, what's the reason of doing np.squeeze() since get the (vector a's size) many one hot encoded arrays using np.eye(num_classes)[a.reshape(-1)]. What you are simply doing is using np.eye` you are creating a diagonal matrix with each class index as 1 rest zero and later using the indexes provided by a.reshape(-1) producing the output corresponding to the index in np.eye(). I didn't understand the need of np.sqeeze since we use it to simply remove single dimensions which we will never have as in the output's dimension will always be (a_flattened_size, num_classes)

Rishabh Agrahari · Accepted Answer · 2019-07-05 13:52:33Z

51

You can also use eye function of numpy:

numpy.eye(number of classes)[vector containing the labels]

edited Jul 5, 2019 at 13:52

Rishabh Agrahari

3,7672 gold badges25 silver badges24 bronze badges

answered Apr 12, 2018 at 7:14

Karma

6617 silver badges9 bronze badges

4 Comments

Oliver Over a year ago

For more clarity using np.identity(num_classes)[indices] might be better. Nice answer!

Maksym Ganenko Over a year ago

That's the only absolutely pythonic answer in all its brevity.

questionto42 Over a year ago

This has repeated the answer of K3---rnc two years later, and nobody seems to see it.

Péter Szilvási Over a year ago

Also consider reshape the vector containing the labels numpy.eye(num_class)[labels.reshape(-1)]. So for example the labels dimension is (x,1) then it will not produce (num_class, x, 1) dimension.

Franck Dernoncourt · Accepted Answer · 2017-02-16 02:15:32Z

33

You can use sklearn.preprocessing.LabelBinarizer:

Example:

import sklearn.preprocessing
a = [1,0,3]
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a)+1))
b = label_binarizer.transform(a)
print('{0}'.format(b))

output:

[[0 1 0 0]
 [1 0 0 0]
 [0 0 0 1]]

Amongst other things, you may initialize sklearn.preprocessing.LabelBinarizer() so that the output of transform is sparse.

answered Feb 16, 2017 at 2:15

Franck Dernoncourt

84.7k81 gold badges374 silver badges556 bronze badges

1 Comment

hnwoh Oct 11 at 12:29

Actually probably OneHotEncoder is more appropriate for this purpose.

Shubham Mishra · Accepted Answer · 2020-04-10 23:42:09Z

8

For 1-hot-encoding

   one_hot_encode=pandas.get_dummies(array)

For Example

ENJOY CODING

edited Apr 10, 2020 at 23:42

answered Apr 10, 2020 at 23:27

Shubham Mishra

8957 silver badges9 bronze badges

5 Comments

Clarus Over a year ago

Thanks for the comment, but a brief description of what the code is doing would be very helpful!

Shubham Mishra Over a year ago

please refer the example

Deepak Over a year ago

@Clarus Checkout the below example. You can access the one hot encoding of each value in your np array by doing a one_hot_encode[value].

>>> import numpy as np >>> import pandas >>> a = np.array([1,0,3]) >>> one_hot_encode=pandas.get_dummies(a) >>> print(one_hot_encode)    0  1  3 0  0  1  0 1  1  0  0 2  0  0  1 >>> print(one_hot_encode[1]) 0    1 1    0 2    0 Name: 1, dtype: uint8 >>> print(one_hot_encode[0]) 0    0 1    1 2    0 Name: 0, dtype: uint8 >>> print(one_hot_encode[3]) 0    0 1    0 2    1 Name: 3, dtype: uint8

PigSpider Over a year ago

Not the ideal tool

Hugh Perkins Over a year ago

welcome to stackoverflow. Generally it's preferred to make the answers self-contained, i.e. copy the example into your answer, rather than just linking to it.

stackoverflowuser2010 · Accepted Answer · 2016-09-14 00:02:01Z

7

Here is a function that converts a 1-D vector to a 2-D one-hot array.

#!/usr/bin/env python
import numpy as np

def convertToOneHot(vector, num_classes=None):
    """
    Converts an input 1-D vector of integers into an output
    2-D array of one-hot vectors, where an i'th input value
    of j will set a '1' in the i'th row, j'th column of the
    output array.

    Example:
        v = np.array((1, 0, 4))
        one_hot_v = convertToOneHot(v)
        print one_hot_v

        [[0 1 0 0 0]
         [1 0 0 0 0]
         [0 0 0 0 1]]
    """

    assert isinstance(vector, np.ndarray)
    assert len(vector) > 0

    if num_classes is None:
        num_classes = np.max(vector)+1
    else:
        assert num_classes > 0
        assert num_classes >= np.max(vector)

    result = np.zeros(shape=(len(vector), num_classes))
    result[np.arange(len(vector)), vector] = 1
    return result.astype(int)

Below is some example usage:

>>> a = np.array([1, 0, 3])

>>> convertToOneHot(a)
array([[0, 1, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 1]])

>>> convertToOneHot(a, num_classes=10)
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

answered Sep 14, 2016 at 0:02

stackoverflowuser2010

41.5k52 gold badges178 silver badges229 bronze badges

2 Comments

johndodo Over a year ago

Note that this only works on vectors (and there is no assert to check vector shape ;) ).

fnunnari Over a year ago

+1 for the generalized approach and parameters check. However, as a common practice, I suggest to NOT use asserts to perform checks on inputs. Use asserts only to verify internal intermediate conditions. Rather, convert all assert ___ into if not ___ raise Exception(<Reason>).

Inaam Ilahi · Accepted Answer · 2019-05-26 15:29:27Z

5

You can use the following code for converting into a one-hot vector:

let x is the normal class vector having a single column with classes 0 to some number:

import numpy as np
np.eye(x.max()+1)[x]

if 0 is not a class; then remove +1.

answered May 26, 2019 at 15:29

Inaam Ilahi

1242 silver badges11 bronze badges

1 Comment

questionto42 Over a year ago

This repeats the answer of K3---rnc three years later.

Jon Deaton · Accepted Answer · 2022-02-03 00:12:18Z

3

I find the easiest solution combines np.take and np.eye

def one_hot(x, depth: int):
  return np.take(np.eye(depth), x, axis=0)

works for x of any shape.

edited Feb 3, 2022 at 0:12

answered Feb 3, 2022 at 0:05

Jon Deaton

4,4997 gold badges31 silver badges45 bronze badges

Comments

David Nemeskey · Accepted Answer · 2016-10-11 22:26:38Z

2

I think the short answer is no. For a more generic case in n dimensions, I came up with this:

# For 2-dimensional data, 4 values
a = np.array([[0, 1, 2], [3, 2, 1]])
z = np.zeros(list(a.shape) + [4])
z[list(np.indices(z.shape[:-1])) + [a]] = 1

I am wondering if there is a better solution -- I don't like that I have to create those lists in the last two lines. Anyway, I did some measurements with timeit and it seems that the numpy-based (indices/arange) and the iterative versions perform about the same.

answered Oct 11, 2016 at 22:26

David Nemeskey

6401 gold badge5 silver badges16 bronze badges

Comments

Emil Melnikov · Accepted Answer · 2018-01-17 14:08:48Z

Just to elaborate on the excellent answer from K3---rnc, here is a more generic version:

def onehottify(x, n=None, dtype=float):
    """1-hot encode x with the max value n (computed from data if n is None)."""
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    return np.eye(n, dtype=dtype)[x]

Also, here is a quick-and-dirty benchmark of this method and a method from the currently accepted answer by YXD (slightly changed, so that they offer the same API except that the latter works only with 1D ndarrays):

def onehottify_only_1d(x, n=None, dtype=float):
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    b = np.zeros((len(x), n), dtype=dtype)
    b[np.arange(len(x)), x] = 1
    return b

The latter method is ~35% faster (MacBook Pro 13 2015), but the former is more general:

>>> import numpy as np
>>> np.random.seed(42)
>>> a = np.random.randint(0, 9, size=(10_000,))
>>> a
array([6, 3, 7, ..., 5, 8, 6])
>>> %timeit onehottify(a, 10)
188 µs ± 5.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit onehottify_only_1d(a, 10)
139 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

MiFi · Accepted Answer · 2018-11-03 10:17:49Z

2

p will be a 2d ndarray.
We want to know which value is the highest in a row, to put there 1 and everywhere else 0.

clean and easy solution:

max_elements_i = np.expand_dims(np.argmax(p, axis=1), axis=1)
one_hot = np.zeros(p.shape)
np.put_along_axis(one_hot, max_elements_i, 1, axis=1)

answered Nov 3, 2018 at 10:17

MiFi

7597 silver badges9 bronze badges

Comments

Alexandre Huat · Accepted Answer · 2020-10-20 11:11:19Z

2

If using tensorflow, there is one_hot():

import tensorflow as tf
import numpy as np

a = np.array([1, 0, 3])
depth = 4
b = tf.one_hot(a, depth)
# <tf.Tensor: shape=(3, 3), dtype=float32, numpy=
# array([[0., 1., 0.],
#        [1., 0., 0.],
#        [0., 0., 0.]], dtype=float32)>

answered Oct 20, 2020 at 11:11

Alexandre Huat

94311 silver badges16 bronze badges

Comments

TeeTracker · Accepted Answer · 2021-05-09 21:13:13Z

2

def one_hot(n, class_num, col_wise=True):
  a = np.eye(class_num)[n.reshape(-1)]
  return a.T if col_wise else a

# Column for different hot
print(one_hot(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 8, 7]), 10))
# Row for different hot
print(one_hot(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 8, 7]), 10, col_wise=False))

edited May 9, 2021 at 21:13

answered May 9, 2021 at 20:53

TeeTracker

7,3888 gold badges44 silver badges51 bronze badges

Comments

Hans T · Accepted Answer · 2018-01-25 13:10:05Z

I recently ran into a problem of same kind and found said solution which turned out to be only satisfying if you have numbers that go within a certain formation. For example if you want to one-hot encode following list:

all_good_list = [0,1,2,3,4]

go ahead, the posted solutions are already mentioned above. But what if considering this data:

problematic_list = [0,23,12,89,10]

If you do it with methods mentioned above, you will likely end up with 90 one-hot columns. This is because all answers include something like n = np.max(a)+1. I found a more generic solution that worked out for me and wanted to share with you:

import numpy as np
import sklearn
sklb = sklearn.preprocessing.LabelBinarizer()
a = np.asarray([1,2,44,3,2])
n = np.unique(a)
sklb.fit(n)
b = sklb.transform(a)

I hope someone encountered same restrictions on above solutions and this might come in handy

eqzx · Accepted Answer · 2018-07-30 01:42:42Z

Here's a dimensionality-independent standalone solution.

This will convert any N-dimensional array arr of nonnegative integers to a one-hot N+1-dimensional array one_hot, where one_hot[i_1,...,i_N,c] = 1 means arr[i_1,...,i_N] = c. You can recover the input via np.argmax(one_hot, -1)

def expand_integer_grid(arr, n_classes):
    """

    :param arr: N dim array of size i_1, ..., i_N
    :param n_classes: C
    :returns: one-hot N+1 dim array of size i_1, ..., i_N, C
    :rtype: ndarray

    """
    one_hot = np.zeros(arr.shape + (n_classes,))
    axes_ranges = [range(arr.shape[i]) for i in range(arr.ndim)]
    flat_grids = [_.ravel() for _ in np.meshgrid(*axes_ranges, indexing='ij')]
    one_hot[flat_grids + [arr.ravel()]] = 1
    assert((one_hot.sum(-1) == 1).all())
    assert(np.allclose(np.argmax(one_hot, -1), arr))
    return one_hot

Sudeep K Rana · Accepted Answer · 2018-08-30 06:36:17Z

1

Such type of encoding are usually part of numpy array. If you are using a numpy array like this :

a = np.array([1,0,3])

then there is very simple way to convert that to 1-hot encoding

out = (np.arange(4) == a[:,None]).astype(np.float32)

That's it.

answered Aug 30, 2018 at 6:36

Sudeep K Rana

3293 silver badges4 bronze badges

Comments

Aaron Lelevier · Accepted Answer · 2018-01-06 18:12:30Z

Here is an example function that I wrote to do this based upon the answers above and my own use case:

def label_vector_to_one_hot_vector(vector, one_hot_size=10):
    """
    Use to convert a column vector to a 'one-hot' matrix

    Example:
        vector: [[2], [0], [1]]
        one_hot_size: 3
        returns:
            [[ 0.,  0.,  1.],
             [ 1.,  0.,  0.],
             [ 0.,  1.,  0.]]

    Parameters:
        vector (np.array): of size (n, 1) to be converted
        one_hot_size (int) optional: size of 'one-hot' row vector

    Returns:
        np.array size (vector.size, one_hot_size): converted to a 'one-hot' matrix
    """
    squeezed_vector = np.squeeze(vector, axis=-1)

    one_hot = np.zeros((squeezed_vector.size, one_hot_size))

    one_hot[np.arange(squeezed_vector.size), squeezed_vector] = 1

    return one_hot

label_vector_to_one_hot_vector(vector=[[2], [0], [1]], one_hot_size=3)

Jordy Van Landeghem · Accepted Answer · 2018-06-05 13:50:11Z

0

I am adding for completion a simple function, using only numpy operators:

   def probs_to_onehot(output_probabilities):
        argmax_indices_array = np.argmax(output_probabilities, axis=1)
        onehot_output_array = np.eye(np.unique(argmax_indices_array).shape[0])[argmax_indices_array.reshape(-1)]
        return onehot_output_array

It takes as input a probability matrix: e.g.:

[[0.03038822 0.65810204 0.16549407 0.3797123 ] ... [0.02771272 0.2760752 0.3280924 0.33458805]]

And it will return

[[0 1 0 0] ... [0 0 0 1]]

edited Jun 5, 2018 at 13:50

answered Jun 5, 2018 at 10:04

Jordy Van Landeghem

193 bronze badges

Comments

Rango · Accepted Answer · 2024-06-22 22:22:46Z

0

For a more general cases where you have a ndarray, you could use numpy broadcasting:

a = array([[[[1, 0, 3]]]]) # (1, 1, 1, 3)
b = (a[..., np.newaxis] == np.arange(np.max(a) + 1)).astype(np.int32)

which would give you:

array([[[[[0, 1, 0, 0],
          [1, 0, 0, 0],
          [0, 0, 0, 1]]]]], dtype=int32)

The result shape is (1, 1, 1, 3, 4).

answered Jun 22, 2024 at 22:22

Rango

11 silver badge2 bronze badges

Comments

Jake Levi · Accepted Answer · 2024-09-05 17:40:33Z

Simple example using np.put_along_axis:

x = rng.integers(0, 10, 20)
t = np.zeros([20, 10])
np.put_along_axis(t, indices=np.expand_dims(x, 1), values=1, axis=1)

print(x)
print(t)

[8 6 5 2 3 0 0 0 1 8 6 9 5 6 9 7 6 5 5 9]
[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

Inaam Ilahi · Accepted Answer · 2019-02-27 18:33:14Z

-1

Use the following code. It works best.

def one_hot_encode(x):
"""
    argument
        - x: a list of labels
    return
        - one hot encoding matrix (number of labels, number of class)
"""
encoded = np.zeros((len(x), 10))

for idx, val in enumerate(x):
    encoded[idx][val] = 1

return encoded

Found it here P.S You don't need to go into the link.

answered Feb 27, 2019 at 18:33

Inaam Ilahi

1242 silver badges11 bronze badges

4 Comments

Kenan Over a year ago

You should avoid using loops with numpy

Alexandre Huat Over a year ago

It does not answer the question: "Is there a quick way to do this? Quicker than just looping over a to set elements of b, that is."

Inaam Ilahi Over a year ago

@AlexandreHuat You can use the numpy function np.eye()

Alexandre Huat Over a year ago

Then you should make an answer where you say that one can use `numpy.eye() (but it was already done by another user). Please, make sure to read questions and already posted answers carefully in order to maintain the quality of stackoverflow and the community.

Guillaume Chevalier · Accepted Answer · 2020-01-03 15:49:43Z

-1

Using a Neuraxle pipeline step:

Set up your example

import numpy as np
a = np.array([1,0,3])
b = np.array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

Do the actual conversion

from neuraxle.steps.numpy import OneHotEncoder
encoder = OneHotEncoder(nb_columns=4)
b_pred = encoder.transform(a)

Assert it works

assert b_pred == b

Link to documentation: neuraxle.steps.numpy.OneHotEncoder

edited Jan 3, 2020 at 15:49

answered Dec 10, 2019 at 7:39

Guillaume Chevalier

11.2k11 gold badges57 silver badges83 bronze badges

Collectives™ on Stack Overflow

Convert array of indices to one-hot encoded array in NumPy

24 Answers 24

6 Comments

6 Comments

Comments

1 Comment

4 Comments

1 Comment

5 Comments

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

4 Comments

Using a Neuraxle pipeline step:

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

24 Answers 24

6 Comments

6 Comments

Comments

1 Comment

4 Comments

1 Comment

5 Comments

2 Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

4 Comments

Using a Neuraxle pipeline step:

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related