Single column dataframe containing 1D lists using a numpy 2D array

Question

I want to create a new single column pandas dataframe using a 2D numpy array. Apparently, each row should contain 1D lists. Following is a simplified reproducible example.

import pandas as pd
import numpy as np

arr = np.ones((4,3)) # could be any 2D array

What I want is,

       lists
0  [1, 1, 1]
1  [1, 1, 1]
2  [1, 1, 1]
3  [1, 1, 1]

Now, df = pd.DataFrame(arr, columns=['lists']) gives the error,

ValueError: Shape of passed values is (4, 3), indices imply (4, 1)

And df = pd.DataFrame(list(arr), columns=['lists']) gives the error,

ValueError: 1 columns passed, passed data had 3 columns

Finally, df = pd.DataFrame(arr.flatten(), columns=['lists']) gives a wrong dataframe with all cells having a scalar 1.

How do I get what I want?

This was a more interesting problem than I expected :)

Karl Knechtel
– Karl Knechtel

2020-05-28 19:45:59 +00:00
Commented May 28, 2020 at 19:45 — Karl Knechtel
– Karl Knechtel, Commented May 28, 2020 at 19:45

Karl Knechtel · Accepted Answer · 2020-05-28 19:37:19Z

5

From each row of the 2d array (i.e. a 1d array), construct a singleton tuple that contains that row, and build the DataFrame from that. We can elegantly do this using a generator expression:

>>> df = pd.DataFrame(((x,) for x in arr), columns=['lists'])
>>> df
             lists
0  [1.0, 1.0, 1.0]
1  [1.0, 1.0, 1.0]
2  [1.0, 1.0, 1.0]
3  [1.0, 1.0, 1.0]

The constructor iterates over the tuple, rather than the underlying array, in order to determine the column values in a given row. There is one such value - the 1d array - so that gets stored for that row in the single available column.

The cell values are indeed Numpy arrays:

>>> df['lists'][0]
array([1., 1., 1.])

answered May 28, 2020 at 19:37

Karl Knechtel

61.4k14 gold badges131 silver badges193 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Anshul · Accepted Answer · 2020-05-28 19:44:34Z

5

data = {"lists": list(arr)}

df = pd.DataFrame(data, columns=['lists'])

print(df)

Output:

             lists
0  [1.0, 1.0, 1.0]
1  [1.0, 1.0, 1.0]
2  [1.0, 1.0, 1.0]
3  [1.0, 1.0, 1.0]

edited May 28, 2020 at 19:44

answered May 28, 2020 at 19:40

Anshul

1,4232 gold badges8 silver badges15 bronze badges

3 Comments

Karl Knechtel Over a year ago

[val for val in arr] is more neatly written as list(arr).

Karl Knechtel Over a year ago

Hmm, I think this is more elegant actually :)

Anshul Over a year ago

Absolutely. Learned something new today as well .. .:)

Sheila Mbadi · Accepted Answer · 2020-05-28 19:48:27Z

3

Get all rows of the array, create dataframe with them. Transpose then add column name.

import pandas as pd
import numpy as np

# ones array
arr = np.ones((4,3), dtype=int)

# get all rows of the array transpose and add column name
df  = pd.DataFrame([list(arr)]).T
df.columns = ['lists']
df

edited May 28, 2020 at 19:48

answered May 28, 2020 at 19:39

Sheila Mbadi

1165 bronze badges

2 Comments

Karl Knechtel Over a year ago

If OP wants ints in the output, it would be better to start with a dtype=int for the np.ones, rather than converting at this step. Nice idea, though. Took me a little while to figure out what's going on :) this basically adds the wrapping at a different step in the process, but is functionally the same.

Karl Knechtel Over a year ago

(After changing that, the same list(arr) advice I gave to @Anshul applies, too.)

Collectives™ on Stack Overflow

Single column dataframe containing 1D lists using a numpy 2D array

3 Answers 3

Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related