1

I have several series variables I would like to concatenate (along axis=1) to create a DataFrame. I would like the series' names to appear as column names in the DataFrame. I have come across several ways to do this.

The most intuitive approach seems to me to be the following :

import pandas as pd

x1 = pd.Series([1,2,3],name='x1')
x2 = pd.Series([11,12,13],name='x2')
              
df = pd.DataFrame([x1,x2])
print(df)

But rather than make the Series names the column headers, the series data are used as rows in the DataFrame.

     0   1   2
x1   1   2   3
x2  11  12  13

This strikes me as counter-intuitive for two reasons.

  • The data in a Series is likely to be all of one type of data, i.e. stock prices, time series data, etc. So it seems intuitive that the Series data should be a column, rather than a row, in the DataFrame.

  • When extracting a column as a Series from an existing DataFrame, the column name is used as the name of the Series.

Example :

df = pd.DataFrame({'x1' : [1,2,3], 'x2' : [4,5,6]})
print(type(df['x1']))
print(df['x1'].name)

<class 'pandas.core.series.Series'>
x1

So why isn't the name used as column header when constructing a DataFrame from a Series?```

I can always construct a DataFrame from a dictionary to get the result I want :

df = pd.DataFrame({'x1' : x1, 'x2' : x2})
print(df)

   x1  x2
0   1  11
1   2  12
2   3  13

But this strikes me as awkward, since I would have to duplicate the series' names (or at least refer to them in the construction of the dictionary).

On the other hand, the Pandas concat method does what I would expect for default behavior :

df = pd.concat([x1,x2],axis=1)
print(df)

   x1  x2
0   1  11
1   2  12
2   3  13

So, my question is, why isn't the behavior I get with concat the default behavior when constructing a DataFrame from a list of series variables?

4
  • 1
    you should ask authors of pandas why they decide this. But for me it seems correct - Series may have assigned names to values instead of numbers 0,1,2 - - pd.Series({"X": 1, "Y": 2, "Z": 3}, name='position1') - so they are like "headers" - but normally pandas display it as indexes. And this way it keeps different information about one object and DataFrame keeps objects in rows. BTW: if you use concat() with default values - df = pd.concat([x1,x2]) then you get different result. axis=1 is NOT default value. Commented May 2, 2021 at 22:52
  • Does this mean that a Series can also be viewed as a something like a C-struct, with a heterogeneous collection of fields? As in pd.Series({"v" : [1,2,3],"type" : "vector"}) ? It never occurred to me that this would work (it does!). I didn't appreciate this use mode (if in fact that is an intended use). Commented May 3, 2021 at 13:43
  • Does this answer your question? Pandas: Creating DataFrame from Series Commented Sep 26, 2021 at 17:36
  • This is a duplicate of - and is missing crucial answers from - stackoverflow.com/a/23522030/313768 ; particularly the concat approach. Commented Sep 26, 2021 at 17:36

1 Answer 1

1
x1 = pd.Series([1,2,3],name='x1')
x2 = pd.Series([11,12,13],name='x2')

df = pd.DataFrame([x1,x2]).transpose()
>>> df
   x1  x2
0   1  11
1   2  12
2   3  13

Because pd.DataFrame does not make a zip for you:

>>> pd.DataFrame(zip(x1, x2), columns=[x1.name, x2.name])
   x1  x2
0   1  11
1   2  12
2   3  13
Sign up to request clarification or add additional context in comments.

6 Comments

Right - this is another way. I just need to get my head around the fact that a series is not just a fancy "array", but (if I understand the intended use) can also be a collection of heterogeneous fields.
I always think of transpose as very expensive operation so typically avoid it. Is transposing a DataFrame much cheaper operation than matrix transpose?
I think (but I'm not a numpy expert) the arrays are stored internally in a certain way. Functions like reshape or transpose only return a "view" of the data, so they are not cpu expensive.
For an array of (10000, 50000): %timeit a.transpose() give 231 ns ± 3.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
df.transpose() is not a simple call to a.transpose() or df.values.transpose(). You can check here the implementation.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.