15

I wonder if there is an easy way for the obvious task to generate a pandas DataFrame from a list of numpy arrays, where the columns are the arrays. The default behavior seems to let the arrays be the rows, which I totally don't understand why. Here is a quick example:

names = ['data1', 'data2', 'data3']
data = [np.arange(10) for _ in names]
df = pd.DataFrame(data=data, columns=names)

This gives an error, indicating pandas expects 10 columns.

If I do

df = pd.DataFrame(data=data)

I get a DataFrame with 10 columns and 3 rows.

Given that it is generally much more difficult to append rows than columns to a DataFrame I wonder about this behavior, e.g. let's say I quickly want to put a 4th data-array into the DataFrame I want the data to be organized in columns to do

df['data4'] = new_array

How can I quickly build the DataFrame I want?

3 Answers 3

16

As @MaxGhenis pointed out in the comments, from_items is deprecated as of version 0.23. The link suggests to use from_dict instead, so the old answer can be modified to:

pd.DataFrame.from_dict(dict(zip(names, data)))

--------------------------------------------------OLD ANSWER-------------------------------------------------------------

I would use .from_items:

pd.DataFrame.from_items(zip(names, data))

which gives

  data1  data2  data3
0      0      0      0
1      1      1      1
2      2      2      2
3      3      3      3
4      4      4      4
5      5      5      5
6      6      6      6
7      7      7      7
8      8      8      8
9      9      9      9

That should also be faster than transposing:

%timeit pd.DataFrame.from_items(zip(names, data))

1000 loops, best of 3: 281 µs per loop

%timeit pd.DataFrame(data, index=names).T

1000 loops, best of 3: 730 µs per loop

Adding a fourth column is then also fairly simple:

df['data4'] = range(1, 11)

which gives

  data1  data2  data3  data4
0      0      0      0      1
1      1      1      1      2
2      2      2      2      3
3      3      3      3      4
4      4      4      4      5
5      5      5      5      6
6      6      6      6      7
7      7      7      7      8
8      8      8      8      9
9      9      9      9     10

As mentioned by @jezrael in the comments, a third option would be (beware: order not guaranteed)

pd.DataFrame(dict(zip(names, data)), columns=names)

Timing:

%timeit pd.DataFrame(dict(zip(names, data)))

1000 loops, best of 3: 281 µs per loop

Sign up to request clarification or add additional context in comments.

7 Comments

Nice, alternative can be pd.DataFrame(dict(zip(names, data)), columns=names)
Yes, but there is problem it is dict - order is not guaranteed, so need specify it. Try change data1 to data7 for difference ;)
@jezrael, good point :) Then I'll stick to the from_items version.
@MaxGhenis: Thanks, I updated my answer accordingly.
|
4

There are many ways to solve your problem, but the easiest way seems to be df.T (T being shorthand for pandas.DataFrame.transpose):

>>> df = pd.DataFrame(data=data, index=names)
>>> df
       0  1  2  3  4  5  6  7  8  9
data1  0  1  2  3  4  5  6  7  8  9
data2  0  1  2  3  4  5  6  7  8  9
data3  0  1  2  3  4  5  6  7  8  9

>>> df.T 
   data1  data2  data3
0      0      0      0
1      1      1      1
2      2      2      2
3      3      3      3
4      4      4      4
5      5      5      5
6      6      6      6
7      7      7      7
8      8      8      8
9      9      9      9

2 Comments

So I initially declare my data-names as indexes, and then transpose..I see thx! Still, I don't understand the logic behind the default behavior.
Yep, that's one way to do it. I'm surprised at the AssertionError raised by the default behavior, too.
3

from_items is now deprecated. Use from_dict instead:

df = pd.DataFrame.from_dict({
  'data1': np.arange(10),
  'data2': np.arange(10),
  'data3': np.arange(10)
})

This returns:

    data1   data2   data3
0   0   0   0
1   1   1   1
2   2   2   2
3   3   3   3
4   4   4   4
5   5   5   5
6   6   6   6
7   7   7   7
8   8   8   8
9   9   9   9

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.