3

I am using a pandas data frame to clean and process data. However, I need to then convert it into a numpy ndarray in order to use exploit matrix multiplication. I turn the data frame into a list of lists with the following:

x = df.tolist()

This returns the following structure:

[[1, 2], [3, 4], [5, 6], [7, 8] ...]

I then convert it into a numpy array like this:

x = np.array(x)

However, the following print:

print(type(x))
print(type(x[0]))

gives this result:

'numpy.ndarray'
'numpy.float64'

However, I need them both to be numpy arrays. If it's not from a pandas data frame and I just convert a hard-coded list of lists then they are both ndarrays. How do I get the list, and the lists in that list to be ndarrays when that list has been made from a data frame? Many thanks for reading, this has had me stumped for hours.

0

3 Answers 3

4

I think you need values:

df = pd.DataFrame({'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0]})

print (df)
   C  D
0  7  1
1  8  3
2  9  5
3  4  7
4  2  1
5  3  0

x = df.values
print (x)
[[7 1]
 [8 3]
 [9 5]
 [4 7]
 [2 1]
 [3 0]]

And then select by indexing:

print (x[:,0])
[7 8 9 4 2 3]

print (x[:,1])
[1 3 5 7 1 0]

print (type(x[:,0]))
<class 'numpy.ndarray'>

Also is possible transpose array:

x = df.values.T
print (x)
[[7 8 9 4 2 3]
 [1 3 5 7 1 0]]

print (x[0])
[7 8 9 4 2 3]

print (x[1])
[1 3 5 7 1 0]
Sign up to request clarification or add additional context in comments.

Comments

3

How about as_matrix:

x = df.as_matrix()

1 Comment

Seems like as_matrix is deprecated as of pandas version 0.23.0, and that values should be use instead
0

You may want to try df.get_values(), and eventually np.reshape it.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.