Convert pandas dataframe to array of series

Question

The now-deprecated to_matrix and values would provide arrays from a dataframe. However I want to work with the "features" of a dataframe - which means working with the columns as Series. How can a list of Series be extracted from the dataframe ?

BENY · Accepted Answer · 2019-04-27 04:08:50Z

I think you just need transpose the return from .values

df.values.T.tolist()
Out[1321]: 
[['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
 ['c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c4', 'c5', 'c5']]

Or just

df.values.T
Out[1322]: 
array([['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
       ['c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c4', 'c5', 'c5']],
      dtype=object)

If need list of Series we can also do groupby

[y for _,y in df.groupby(level=0,axis=1)]
Out[1328]: 
[  airport
 0      a1
 1      a3
 2      a1
 3      a1
 4      a2
 5      a2
 6      a3
 7      a4
 8      a4
 9      a1,   carrier
 0      c1
 1      c1
 2      c1
 3      c2
 4      c2
 5      c2
 6      c3
 7      c4
 8      c5
 9      c5]

Data input

df
Out[1329]: 
  airport carrier
0      a1      c1
1      a3      c1
2      a1      c1
3      a1      c2
4      a2      c2
5      a2      c2
6      a3      c3
7      a4      c4
8      a4      c5
9      a1      c5

C.Nivs · Accepted Answer · 2019-04-27 04:04:42Z

2

You could do this with a list comprehension:

import pandas as pd

df = pd.DataFrame(some_data)

mat = [df[col].values for col in df.columns]

Where df[col].values returns a Series of the values from a given column

answered Apr 27, 2019 at 4:04

C.Nivs

13.2k3 gold badges21 silver badges48 bronze badges

1 Comment

WestCoastProjects Over a year ago

Ya I was hunting for this but could not put my finger on it (tick tock 7 mins..)

ALollz · Accepted Answer · 2019-04-27 05:10:18Z

2

Can get a list of Series with .to_dict('Series'), just taking the values.

list(df.to_dict('Series').values())

[0    a1
 1    a3
 2    a1
 3    a1
 4    a2
 5    a2
 6    a3
 7    a4
 8    a4
 9    a1
 Name: airport, dtype: object, 0    c1
 1    c1
 2    c1
 3    c2
 4    c2
 5    c2
 6    c3
 7    c4
 8    c5
 9    c5
 Name: carrier, dtype: object]

Each element of the list is a Series:

type(list(df.to_dict('Series').values())[0])
#pandas.core.series.Series

answered Apr 27, 2019 at 5:10

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Comments

user3483203 · Accepted Answer · 2019-04-27 05:30:04Z

1

You can track much of the same information (different dtypes between Series, names of Series) in a numpy structured array that you can in a DataFrame. Pandas has a convenient way of doing this. I am using @Wen's sample data.

u = df.to_records(index=False)

rec.array([('a1', 'c1'), ('a3', 'c1'), ('a1', 'c1'), ('a1', 'c2'),
           ('a2', 'c2'), ('a2', 'c2'), ('a3', 'c3'), ('a4', 'c4'),
           ('a4', 'c5'), ('a1', 'c5')],
          dtype=[('airport', 'O'), ('carrier', 'O')])

u['airport']

array(['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
      dtype=object)

answered Apr 27, 2019 at 5:30

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

3 Comments

WestCoastProjects Over a year ago

in the 2nd code snippet should the rec.array be u.array ?

user3483203 Over a year ago

No, that's just the __repr__ of a structured array.

WestCoastProjects Over a year ago

ah. i usually keep that in the same code block. thx for the clarification

Collectives™ on Stack Overflow

Convert pandas dataframe to array of series

4 Answers 4

Comments

1 Comment

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related