Python ValueError: cannot copy sequence to array

Question

I have a .csv dataset with three columns formatted as follows

  t           X      Y
 0.040662  1.041667  1
 0.139757  1.760417  2
 0.144357  1.190104  1
 0.145341  1.047526  1
 0.145401  1.011882  1
 0.148465  1.002970  1

Instead of manually writing it as

x_final = np.array([1.041667, 1.760417, 1.190104, 1.047526, 1.011882, 1.002970])
v_observations = np.array([1, 2, 1, 1, 1, 1])

I wanted to perform it automatically by copying the pandas dataframe to an array and here is my code

import numpy as np
from numpy.linalg import inv
import pandas as pd

df = pd.read_csv('testdata.csv')
print(df)

df.dropna(inplace=True)

X = df.drop('Y', axis=1)
y = df['Y']

time = df.drop('t', axis=1)
print(X)
d1= np.array([X])
d2 = np.array([y])

x_final = np.array([d1])
y_final = np.array([d2])
z = np.c_[x_final, y_final]

However, I am getting this error when I try to run my code.

ValueError: cannot copy sequence with size 6 to array axis with dimension 2

How can I fix this error?

the error starts here d1= np.array([X]) before reaching there when i try to run it line by line. — Armo
– Armo, Commented Dec 19, 2018 at 20:46

sacuL · Accepted Answer · 2018-12-19 20:52:11Z

5

tl;dr: use .values

Your issue is when you're creating your numpy arrays, you are passing it a list, when I think you are trying to pass it a dataframe:

# This doesn't work
np.array([X])
# This does
np.array(X)

So you can do:

d1= np.array(X)
d2 = np.array(y)

Or better yet:

d1 = X.values
d2 = y.values

To get:

>>> d1
array([[0.040662, 1.041667],
       [0.139757, 1.760417],
       [0.144357, 1.190104],
       [0.145341, 1.047526],
       [0.145401, 1.011882],
       [0.148465, 1.00297 ]])
>>> d2
array([1, 2, 1, 1, 1, 1])

But in the end, your final result would be exactly the same as simply saying:

z = df.dropna().values
>>> z
array([[0.040662, 1.041667, 1.      ],
       [0.139757, 1.760417, 2.      ],
       [0.144357, 1.190104, 1.      ],
       [0.145341, 1.047526, 1.      ],
       [0.145401, 1.011882, 1.      ],
       [0.148465, 1.00297 , 1.      ]])

See the docs for the .values method, which just gives you a numpy representation of a dataframe

edited Dec 19, 2018 at 20:52

answered Dec 19, 2018 at 20:47

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Armo Over a year ago

This is very clean. But when I try to run z, it says ValueError: all the input arrays must have same number of dimensions

sacuL Over a year ago

I think you'll have much less of a headache by just using z = df.dropna().values from the start, rather than go through your whole complicated processing.

Armo Over a year ago

You are awesome. Even if it doesn't work out for me but I have to mark it an accepted answer. I am trying to reproduce this code and I don't know why It is not working out :( sad. https://medium.com/@jaems33/understanding-kalman-filters-with-python-2310e87b8f48 He has put all the code at the end under Putting it all together and that's why I am trying to reproduce.

gorjan · Accepted Answer · 2018-12-19 20:58:46Z

0

a I don't think that you even need to do all of those steps. Moving from a pandas data-frame to numpy 2D array is seamless.

df.dropna(inplace=True)
df_numpy = df.drop("t", axis=1).values

answered Dec 19, 2018 at 20:58

gorjan

5,6452 gold badges24 silver badges44 bronze badges

Collectives™ on Stack Overflow

Python ValueError: cannot copy sequence to array

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related