0

I have a dataframe which has a column that is a list. I want to extract the individual elements in every list in the column. So given this input dataframe:

          A
0     [5, 4, 3, 6]
1     [7, 8, 9, 6]

The intended output should be a list:

      [5, 4, 3, 6,7, 8, 9, 6]

1 Answer 1

4

You can use list comprehension with flatten:

a = [y for x in df.A for y in x]

Or use itertools.chain:

from  itertools import chain

a = list(chain.from_iterable(df.A))

Or use numpy.concatenate:

a = np.concatenate(df.A).tolist()

Or Series.explode, working for pandas 0.25+:

a = df.A.explode().tolist()

Performance with sample data for 100k rows:

df = pd.DataFrame({
        'A':[[5, 4, 3, 6], [7, 8, 9, 6]] * 50000})

print (df)

In [263]: %timeit [y for x in df.A for y in x]
37.7 ms ± 3.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [264]: %timeit list(chain.from_iterable(df.A))
27.3 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [265]: %timeit np.concatenate(df.A).tolist()
1.71 s ± 86.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [266]: %timeit df.A.explode().tolist()
207 ms ± 3.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#ansev1
In [267]: %timeit np.hstack(df['A']).tolist()
328 ms ± 6.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much. Which is the fastest option for a dataframe with over > 60,000 rows

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.