How to take values out of a pandas dataframe and put them into a numpy array?

Question

I have multiple pandas dataframes and I would like to write a function that will take out the values in each column of the dataframes and put them into their own numpy array.

Example dataframe

In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])
In [2]: df
Out[2]: 
    A  B
 0  1  2
 1  1  3
 2  4  6

How would I generate two different numpy arrays out of the values in column A and B

Welcome to StackOverflow. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful. — jezrael
– jezrael, Commented Apr 10, 2019 at 5:36

nck · Accepted Answer · 2019-04-10 06:35:54Z

1

df['A'].values will return numpy arrays. see below:

for col in df.columns:
    print(col, type(df[col].values))
A <class 'numpy.ndarray'>
B <class 'numpy.ndarray'>

answered Apr 10, 2019 at 6:35

nck

731 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

icy121 · Accepted Answer · 2019-04-10 06:25:32Z

0

You can use this method to obtain a list of numpy arrays for each column. You can put the arrays in a dictionary too but I prefer a list over the former because it is ordered and we can always use a zip function with df.columns to get tupled data!

np_arr_list = []
for i in df.columns:
    new_np_arr = np.array(df.loc[:,i])
    np_arr_list.append(new_np_arr)

Output:

[array([  1,  21, 213,  32], dtype=int64),
 array([ 4,  5, 32,  3], dtype=int64),
 array([213,  23,  23,   1], dtype=int64)]

answered Apr 10, 2019 at 6:25

icy121

1317 bronze badges

2 Comments

Ggd Hhdhd Over a year ago

thanks for the help! could you explain the zip function a little bit more?

icy121 Over a year ago

zip() is used to pack elements into tuples and iterate through the tuple. Try these two lines after the code above! for (col_name,np_arr) in zip(df.columns, np_arr_list): ` print("Column Name :", col_name,"\nArray",np_arr,"\n")` It binds col_name to iterate through df.columns and np_arr through np_arr_list. I use it for association of column names and values in some use cases where I want to keep data and names separate. Other way is to use a dictionary with keys as column name and values as numpy arrays.

Rarblack · Accepted Answer · 2019-04-10 07:31:53Z

0

You should use to_numpy() methods because others will be deprecated. If your pandas library is older than 0.24 you should update it first because this method has come after from that version.

>>> df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])
>>> arr = []
>>> for column in df.columns:
...     arr.append(df[column].to_numpy())
...
>>> arr
[array([1, 1, 4], dtype=int64), array([2, 3, 6], dtype=int64)]
>>>

answered Apr 10, 2019 at 7:31

Rarblack

4,6744 gold badges24 silver badges36 bronze badges

4 Comments

Ggd Hhdhd Over a year ago

I tried using this and I got this error AttributeError: 'DataFrame' object has no attribute 'to_numpy'

Rarblack Over a year ago

as I said you need to update pandas package to 0.24 or newer. Latest is 0.24.2. Try print(pd.__version__) to see the current version

Ggd Hhdhd Over a year ago

The one I'm working with is quite old thanks for the tip

Rarblack Over a year ago

@GgdHhdhd Be aware that all other methods like .values . as_matrix will be deprecated so I would not suggest using them

Collectives™ on Stack Overflow

How to take values out of a pandas dataframe and put them into a numpy array?

3 Answers 3

Comments

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related