Pandas : Creating a DataFrame from Series

Question

I have several series variables I would like to concatenate (along axis=1) to create a DataFrame. I would like the series' names to appear as column names in the DataFrame. I have come across several ways to do this.

The most intuitive approach seems to me to be the following :

import pandas as pd

x1 = pd.Series([1,2,3],name='x1')
x2 = pd.Series([11,12,13],name='x2')
              
df = pd.DataFrame([x1,x2])
print(df)

But rather than make the Series names the column headers, the series data are used as rows in the DataFrame.

     0   1   2
x1   1   2   3
x2  11  12  13

This strikes me as counter-intuitive for two reasons.

The data in a Series is likely to be all of one type of data, i.e. stock prices, time series data, etc. So it seems intuitive that the Series data should be a column, rather than a row, in the DataFrame.
When extracting a column as a Series from an existing DataFrame, the column name is used as the name of the Series.

Example :

df = pd.DataFrame({'x1' : [1,2,3], 'x2' : [4,5,6]})
print(type(df['x1']))
print(df['x1'].name)

<class 'pandas.core.series.Series'>
x1

So why isn't the name used as column header when constructing a DataFrame from a Series?```

I can always construct a DataFrame from a dictionary to get the result I want :

df = pd.DataFrame({'x1' : x1, 'x2' : x2})
print(df)

   x1  x2
0   1  11
1   2  12
2   3  13

But this strikes me as awkward, since I would have to duplicate the series' names (or at least refer to them in the construction of the dictionary).

On the other hand, the Pandas concat method does what I would expect for default behavior :

df = pd.concat([x1,x2],axis=1)
print(df)

   x1  x2
0   1  11
1   2  12
2   3  13

So, my question is, why isn't the behavior I get with concat the default behavior when constructing a DataFrame from a list of series variables?

you should ask authors of pandas why they decide this. But for me it seems correct - Series may have assigned names to values instead of numbers 0,1,2 - - pd.Series({"X": 1, "Y": 2, "Z": 3}, name='position1') - so they are like "headers" - but normally pandas display it as indexes. And this way it keeps different information about one object and DataFrame keeps objects in rows. BTW: if you use concat() with default values - df = pd.concat([x1,x2]) then you get different result. axis=1 is NOT default value. — furas
– furas, Commented May 2, 2021 at 22:52
Does this mean that a Series can also be viewed as a something like a C-struct, with a heterogeneous collection of fields? As in pd.Series({"v" : [1,2,3],"type" : "vector"}) ? It never occurred to me that this would work (it does!). I didn't appreciate this use mode (if in fact that is an intended use). — Donna
– Donna, Commented May 3, 2021 at 13:43
Does this answer your question? Pandas: Creating DataFrame from Series — Reinderien
– Reinderien, Commented Sep 26, 2021 at 17:36
This is a duplicate of - and is missing crucial answers from - stackoverflow.com/a/23522030/313768 ; particularly the concat approach. — Reinderien
– Reinderien, Commented Sep 26, 2021 at 17:36

Corralien · Accepted Answer · 2021-05-02 20:37:01Z

1

x1 = pd.Series([1,2,3],name='x1')
x2 = pd.Series([11,12,13],name='x2')

df = pd.DataFrame([x1,x2]).transpose()

Because pd.DataFrame does not make a zip for you:

>>> pd.DataFrame(zip(x1, x2), columns=[x1.name, x2.name])
   x1  x2
0   1  11
1   2  12
2   3  13

edited May 2, 2021 at 20:37

answered May 2, 2021 at 20:21

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Donna Over a year ago

Right - this is another way. I just need to get my head around the fact that a series is not just a fancy "array", but (if I understand the intended use) can also be a collection of heterogeneous fields.

Donna Over a year ago

I always think of transpose as very expensive operation so typically avoid it. Is transposing a DataFrame much cheaper operation than matrix transpose?

Corralien Over a year ago

I think (but I'm not a numpy expert) the arrays are stored internally in a certain way. Functions like reshape or transpose only return a "view" of the data, so they are not cpu expensive.

Corralien Over a year ago

For an array of (10000, 50000): %timeit a.transpose() give 231 ns ± 3.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Corralien Over a year ago

df.transpose() is not a simple call to a.transpose() or df.values.transpose(). You can check here the implementation.

|

Collectives™ on Stack Overflow

Pandas : Creating a DataFrame from Series

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related