0

I have multiple pandas dataframes and I would like to write a function that will take out the values in each column of the dataframes and put them into their own numpy array.

Example dataframe

In [1]: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])
In [2]: df
Out[2]: 
    A  B
 0  1  2
 1  1  3
 2  4  6

How would I generate two different numpy arrays out of the values in column A and B

6
  • Just use df.values; Commented Apr 10, 2019 at 5:32
  • doesn't that take the values from the whole dataframe? Commented Apr 10, 2019 at 5:33
  • Welcome to StackOverflow. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful. Commented Apr 10, 2019 at 5:36
  • I edited my question. Any other ways I could improve it? Commented Apr 10, 2019 at 5:58
  • What is your expected output? A list of numpy array? Commented Apr 10, 2019 at 6:13

3 Answers 3

1

df['A'].values will return numpy arrays. see below:

for col in df.columns:
    print(col, type(df[col].values))
A <class 'numpy.ndarray'>
B <class 'numpy.ndarray'>
Sign up to request clarification or add additional context in comments.

Comments

0

You can use this method to obtain a list of numpy arrays for each column. You can put the arrays in a dictionary too but I prefer a list over the former because it is ordered and we can always use a zip function with df.columns to get tupled data!

np_arr_list = []
for i in df.columns:
    new_np_arr = np.array(df.loc[:,i])
    np_arr_list.append(new_np_arr)

Output:

[array([  1,  21, 213,  32], dtype=int64),
 array([ 4,  5, 32,  3], dtype=int64),
 array([213,  23,  23,   1], dtype=int64)]

Here's a screenshot for the same

2 Comments

thanks for the help! could you explain the zip function a little bit more?
zip() is used to pack elements into tuples and iterate through the tuple. Try these two lines after the code above! for (col_name,np_arr) in zip(df.columns, np_arr_list): ` print("Column Name :", col_name,"\nArray",np_arr,"\n")` It binds col_name to iterate through df.columns and np_arr through np_arr_list. I use it for association of column names and values in some use cases where I want to keep data and names separate. Other way is to use a dictionary with keys as column name and values as numpy arrays.
0

You should use to_numpy() methods because others will be deprecated. If your pandas library is older than 0.24 you should update it first because this method has come after from that version.

>>> df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B'])
>>> arr = []
>>> for column in df.columns:
...     arr.append(df[column].to_numpy())
...
>>> arr
[array([1, 1, 4], dtype=int64), array([2, 3, 6], dtype=int64)]
>>>

4 Comments

I tried using this and I got this error AttributeError: 'DataFrame' object has no attribute 'to_numpy'
as I said you need to update pandas package to 0.24 or newer. Latest is 0.24.2. Try print(pd.__version__) to see the current version
The one I'm working with is quite old thanks for the tip
@GgdHhdhd Be aware that all other methods like .values . as_matrix will be deprecated so I would not suggest using them

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.