Combine two columns in pandas dataframe but in specific order

Question

For example, I have a dataframe where two of the columns are "Zeroes" and "Ones" that contain only zeroes and ones, respectively. If I combine them into one column I get first all the zeroes, then all the ones.

I want to combine them in a way that I get each element from both columns, not all elements from the first column and all elements from the second column. So I don't want the result to be [0, 0, 0, 1, 1, 1], I need it to be [0, 1, 0, 1, 0, 1].

I process 100K+ rows of data. What is the fastest or optimal way to achieve this? Thanks in advance!

Can you provide some code showing what you have already tried? — Adam J
– Adam J, Commented Nov 3, 2021 at 11:19
Well, it isn't hard to do it iteratively, loop through the columns, append the element from the first column, then append element from the second column but I guess there is a faster, more "pandas" way to do it — codearts
– codearts, Commented Nov 3, 2021 at 11:25

Dani Mesejo · Accepted Answer · 2021-11-03 11:33:44Z

4

Try:

import pandas as pd

df = pd.DataFrame({ "zeroes" : [0, 0, 0], "ones":  [1, 1, 1], "some_other" : list("abc")})
res = df[["zeroes", "ones"]].to_numpy().ravel(order="C")
print(res)

Output

[0 1 0 1 0 1]

Micro-Benchmarks

import pandas as pd
from itertools import chain
df = pd.DataFrame({ "zeroes" : [0] * 10_000, "ones":  [1] * 10_000})
%timeit df[["zeroes", "ones"]].to_numpy().ravel(order="C").tolist()
672 µs ± 8.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [v for vs in zip(df["zeroes"], df["ones"]) for v in vs]
2.57 ms ± 54 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit list(chain.from_iterable(zip(df["zeroes"], df["ones"]))) 
2.11 ms ± 73 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

edited Nov 3, 2021 at 11:33

answered Nov 3, 2021 at 11:23

Dani Mesejo

62.2k6 gold badges56 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jezrael Over a year ago

good solution ;)

Mahdi F. · Accepted Answer · 2021-11-03 12:16:01Z

1

You can use numpy.flatten() like below as alternative:

import numpy as np
import pandas as pd
df[["zeroes", "ones"]].to_numpy().flatten()

Benchmark (runnig on colab):

df = pd.DataFrame({ "zeroes" : [0] * 10_000_000, "ones":  [1] * 10_000_000})

%timeit df[["zeroes", "ones"]].to_numpy().flatten().tolist()
1 loop, best of 5: 320 ms per loop

%timeit df[["zeroes", "ones"]].to_numpy().ravel(order="C").tolist()
1 loop, best of 5: 322 ms per loop

edited Nov 3, 2021 at 12:16

answered Nov 3, 2021 at 12:10

Mahdi F.

24.1k5 gold badges25 silver badges32 bronze badges

Comments

Tobias P. G. · Accepted Answer · 2021-11-03 11:23:41Z

0

I don't know if this is the most optimal solution but it should solve your case.

df = pd.DataFrame([[0 for x in range(10)], [1 for x in range(10)]]).T
l = [[x, y] for x, y in zip(df[0], df[1])]
l = [x for y in l for x in y]
l

answered Nov 3, 2021 at 11:23

Tobias P. G.

8448 silver badges19 bronze badges

Comments

jnic · Accepted Answer · 2021-11-03 11:25:48Z

0

This may help you: Alternate elements of different columns using Pandas

pd.concat(
    [df1, df2], axis=1
).stack().reset_index(1, drop=True).to_frame('C').rename(index='CC{}'.format)

answered Nov 3, 2021 at 11:25

jnic

113 bronze badges

Collectives™ on Stack Overflow

Combine two columns in pandas dataframe but in specific order

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related