84

I have a df like so:

import pandas
a=[['1/2/2014', 'a', '6', 'z1'], 
   ['1/2/2014', 'a', '3', 'z1'], 
   ['1/3/2014', 'c', '1', 'x3'],
   ]
df = pandas.DataFrame.from_records(a[1:],columns=a[0])

I want to flatten the df so it is one continuous list like so:

['1/2/2014', 'a', '6', 'z1', '1/2/2014', 'a', '3', 'z1','1/3/2014', 'c', '1', 'x3']

I can loop through the rows and extend to a list, but is a much easier way to do it?

2
  • possible duplicate of Comprehension for flattening a sequence of sequences? Commented Aug 22, 2014 at 5:22
  • 5
    i looked at that above answer when searching for an answer. That question isn't a dataframe setting. If that answer solved my problem, I wouldn't have needed to post my question. Commented Aug 22, 2014 at 6:22

5 Answers 5

139

You can use .flatten() on the DataFrame converted to a NumPy array:

df.to_numpy().flatten()

and you can also add .tolist() if you want the result to be a Python list.

Edit

In previous versions of Pandas, the values attributed was used instead of the .to_numpy() method, as mentioned in the comments below.

Sign up to request clarification or add additional context in comments.

3 Comments

pandas now recommends using .to_numpy() instead of .values.
@Frank Why? .values already exists, it's a numpy array under the hood. Why call a function?
@endolith I'm just passing along what the docs say – ask them, not me. Some more context here: stackoverflow.com/a/54508052
20

Maybe use stack?

df.stack().values
array(['1/2/2014', 'a', '3', 'z1', '1/3/2014', 'c', '1', 'x3'], dtype=object)

(Edit: Incidentally, the DF in the Q uses the first row as labels, which is why they're not in the output here.)

Comments

4

You can try with numpy

import numpy as np
np.reshape(df.values, (1,df.shape[0]*df.shape[1]))

Comments

4

you can use the reshape method

df.values.reshape(-1)

1 Comment

Hi ahmed, you could improve your answer formatting your code, putting links to the official documentation and finally writing the output gotten using your answer.
0

The previously mentioned df.values.flatten().tolist() and df.to_numpy().flatten().tolist() are concise and effective, but I spent a very long time trying to learn how to 'do the work myself' via list comprehension and without resorting built-in functions.

For anyone else who is interested, try:

[ row for col in df for row in df[col] ]

Turns out that this solution to flattening a df via list comprehension (which I haven't found elsewhere on SO) is just a small modification to the solution for flattening nested lists (that can be found all over SO):

[ val for sublst in lst for val in sublst ]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.