1

The now-deprecated to_matrix and values would provide arrays from a dataframe. However I want to work with the "features" of a dataframe - which means working with the columns as Series. How can a list of Series be extracted from the dataframe ?

4 Answers 4

4

I think you just need transpose the return from .values

df.values.T.tolist()
Out[1321]: 
[['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
 ['c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c4', 'c5', 'c5']]

Or just

df.values.T
Out[1322]: 
array([['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
       ['c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c4', 'c5', 'c5']],
      dtype=object)

If need list of Series we can also do groupby

[y for _,y in df.groupby(level=0,axis=1)]
Out[1328]: 
[  airport
 0      a1
 1      a3
 2      a1
 3      a1
 4      a2
 5      a2
 6      a3
 7      a4
 8      a4
 9      a1,   carrier
 0      c1
 1      c1
 2      c1
 3      c2
 4      c2
 5      c2
 6      c3
 7      c4
 8      c5
 9      c5]

Data input

df
Out[1329]: 
  airport carrier
0      a1      c1
1      a3      c1
2      a1      c1
3      a1      c2
4      a2      c2
5      a2      c2
6      a3      c3
7      a4      c4
8      a4      c5
9      a1      c5
Sign up to request clarification or add additional context in comments.

Comments

2

You could do this with a list comprehension:

import pandas as pd

df = pd.DataFrame(some_data)

mat = [df[col].values for col in df.columns]

Where df[col].values returns a Series of the values from a given column

1 Comment

Ya I was hunting for this but could not put my finger on it (tick tock 7 mins..)
2

Can get a list of Series with .to_dict('Series'), just taking the values.

list(df.to_dict('Series').values())

[0    a1
 1    a3
 2    a1
 3    a1
 4    a2
 5    a2
 6    a3
 7    a4
 8    a4
 9    a1
 Name: airport, dtype: object, 0    c1
 1    c1
 2    c1
 3    c2
 4    c2
 5    c2
 6    c3
 7    c4
 8    c5
 9    c5
 Name: carrier, dtype: object]

Each element of the list is a Series:

type(list(df.to_dict('Series').values())[0])
#pandas.core.series.Series

Comments

1

You can track much of the same information (different dtypes between Series, names of Series) in a numpy structured array that you can in a DataFrame. Pandas has a convenient way of doing this. I am using @Wen's sample data.


u = df.to_records(index=False)

rec.array([('a1', 'c1'), ('a3', 'c1'), ('a1', 'c1'), ('a1', 'c2'),
           ('a2', 'c2'), ('a2', 'c2'), ('a3', 'c3'), ('a4', 'c4'),
           ('a4', 'c5'), ('a1', 'c5')],
          dtype=[('airport', 'O'), ('carrier', 'O')])

u['airport']

array(['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
      dtype=object)

3 Comments

in the 2nd code snippet should the rec.array be u.array ?
No, that's just the __repr__ of a structured array.
ah. i usually keep that in the same code block. thx for the clarification

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.