Pandas DataFrame convert to binary

Question

Given pd.DataFrame with 0.0 < values < 1.0, I would like to convert it to binary values 0 /1 according to defined threshold eps = 0.5,

      0     1     2
0  0.35  0.20  0.81
1  0.41  0.75  0.59
2  0.62  0.40  0.94
3  0.17  0.51  0.29

Right now, I only have this for loop which takes quite long time for large dataset:

import numpy as np
import pandas as pd

data = np.array([[.35, .2, .81],[.41, .75, .59],
                [.62, .4, .94], [.17, .51, .29]])

df = pd.DataFrame(data, index=range(data.shape[0]), columns=range(data.shape[1]))
eps = .5
b = np.zeros((df.shape[0], df.shape[1]))
for i in range(df.shape[0]):
    for j in range(df.shape[1]):
        if df.loc[i,j] < eps:
            b[i,j] = 0
        else:
            b[i,j] = 1
df_bin = pd.DataFrame(b, columns=df.columns, index=df.index)

Does anybody know a more effective way to convert to binary values?

     0    1    2
0  0.0  0.0  1.0
1  0.0  1.0  1.0
2  1.0  0.0  1.0
3  0.0  1.0  0.0

Thanks,

rafaelc · Accepted Answer · 2019-11-04 16:24:01Z

9

`df.round`

>>> df.round()

`np.round`

>>> np.round(df)

`astype`

>>> df.ge(0.5).astype(int)

All which yield

     0    1    2
0  0.0  0.0  1.0
1  0.0  1.0  1.0
2  1.0  0.0  1.0
3  0.0  1.0  0.0

Note: round works here because it automatically sets the threshold for .5 between two integers. For custom thresholds, use the 3rd solution

edited Nov 4, 2019 at 16:24

answered Nov 4, 2019 at 16:18

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

anky · Accepted Answer · 2019-11-04 16:24:44Z

8

Or you can use np.where() and assign the values to the underlying array:

df[:]=np.where(df<0.5,0,1)

answered Nov 4, 2019 at 16:24

anky

75.3k11 gold badges46 silver badges76 bronze badges

Comments

Erfan · Accepted Answer · 2019-11-04 17:49:30Z

Since we have a quite a some answers, which are all using different methods, I was curious about the speed comparison. Thought I share:

# create big test dataframe
dfbig = pd.concat([df]*200000, ignore_index=True)
print(dfbig.shape)

(800000, 3)

# pandas round()
%%timeit 
dfbig.round()

101 ms ± 4.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# numpy round()
%%timeit
np.round(dfbig)

104 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# pandas .ge & .astype
%%timeit
dfbig.ge(0.5).astype(int)

9.32 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# numpy.where
%%timeit
np.where(dfbig<0.5, 0, 1)

21.5 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Conlusion:

pandas ge & astype
np.where
np.round
pandas round

Collectives™ on Stack Overflow

Pandas DataFrame convert to binary

3 Answers 3

`df.round`

`np.round`

`astype`

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

df.round

np.round

astype

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

`df.round`

`np.round`

`astype`