6

I want to create a new single column pandas dataframe using a 2D numpy array. Apparently, each row should contain 1D lists. Following is a simplified reproducible example.

import pandas as pd
import numpy as np

arr = np.ones((4,3)) # could be any 2D array

What I want is,

       lists
0  [1, 1, 1]
1  [1, 1, 1]
2  [1, 1, 1]
3  [1, 1, 1]

Now, df = pd.DataFrame(arr, columns=['lists']) gives the error,

ValueError: Shape of passed values is (4, 3), indices imply (4, 1)

And df = pd.DataFrame(list(arr), columns=['lists']) gives the error,

ValueError: 1 columns passed, passed data had 3 columns

Finally, df = pd.DataFrame(arr.flatten(), columns=['lists']) gives a wrong dataframe with all cells having a scalar 1.

How do I get what I want?

1
  • 1
    This was a more interesting problem than I expected :) Commented May 28, 2020 at 19:45

3 Answers 3

5

From each row of the 2d array (i.e. a 1d array), construct a singleton tuple that contains that row, and build the DataFrame from that. We can elegantly do this using a generator expression:

>>> df = pd.DataFrame(((x,) for x in arr), columns=['lists'])
>>> df
             lists
0  [1.0, 1.0, 1.0]
1  [1.0, 1.0, 1.0]
2  [1.0, 1.0, 1.0]
3  [1.0, 1.0, 1.0]

The constructor iterates over the tuple, rather than the underlying array, in order to determine the column values in a given row. There is one such value - the 1d array - so that gets stored for that row in the single available column.

The cell values are indeed Numpy arrays:

>>> df['lists'][0]
array([1., 1., 1.])
Sign up to request clarification or add additional context in comments.

Comments

5
data = {"lists": list(arr)}

df = pd.DataFrame(data, columns=['lists'])

print(df)

Output:

             lists
0  [1.0, 1.0, 1.0]
1  [1.0, 1.0, 1.0]
2  [1.0, 1.0, 1.0]
3  [1.0, 1.0, 1.0]

3 Comments

[val for val in arr] is more neatly written as list(arr).
Hmm, I think this is more elegant actually :)
Absolutely. Learned something new today as well .. .:)
3

Get all rows of the array, create dataframe with them. Transpose then add column name.

import pandas as pd
import numpy as np

# ones array
arr = np.ones((4,3), dtype=int)

# get all rows of the array transpose and add column name
df  = pd.DataFrame([list(arr)]).T
df.columns = ['lists']
df

2 Comments

If OP wants ints in the output, it would be better to start with a dtype=int for the np.ones, rather than converting at this step. Nice idea, though. Took me a little while to figure out what's going on :) this basically adds the wrapping at a different step in the process, but is functionally the same.
(After changing that, the same list(arr) advice I gave to @Anshul applies, too.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.