1

I have a .csv dataset with three columns formatted as follows

  t           X      Y
 0.040662  1.041667  1
 0.139757  1.760417  2
 0.144357  1.190104  1
 0.145341  1.047526  1
 0.145401  1.011882  1
 0.148465  1.002970  1

Instead of manually writing it as

x_final = np.array([1.041667, 1.760417, 1.190104, 1.047526, 1.011882, 1.002970])
v_observations = np.array([1, 2, 1, 1, 1, 1])

I wanted to perform it automatically by copying the pandas dataframe to an array and here is my code

import numpy as np
from numpy.linalg import inv
import pandas as pd

df = pd.read_csv('testdata.csv')
print(df)

df.dropna(inplace=True)

X = df.drop('Y', axis=1)
y = df['Y']

time = df.drop('t', axis=1)
print(X)
d1= np.array([X])
d2 = np.array([y])

x_final = np.array([d1])
y_final = np.array([d2])
z = np.c_[x_final, y_final]

However, I am getting this error when I try to run my code.

ValueError: cannot copy sequence with size 6 to array axis with dimension 2

How can I fix this error?

2
  • how about: z = df.dropna().drop('t', axis=1).values Commented Dec 19, 2018 at 20:42
  • 1
    the error starts here d1= np.array([X]) before reaching there when i try to run it line by line. Commented Dec 19, 2018 at 20:46

2 Answers 2

5

tl;dr: use .values

Your issue is when you're creating your numpy arrays, you are passing it a list, when I think you are trying to pass it a dataframe:

# This doesn't work
np.array([X])
# This does
np.array(X)

So you can do:

d1= np.array(X)
d2 = np.array(y)

Or better yet:

d1 = X.values
d2 = y.values

To get:

>>> d1
array([[0.040662, 1.041667],
       [0.139757, 1.760417],
       [0.144357, 1.190104],
       [0.145341, 1.047526],
       [0.145401, 1.011882],
       [0.148465, 1.00297 ]])
>>> d2
array([1, 2, 1, 1, 1, 1])

But in the end, your final result would be exactly the same as simply saying:

z = df.dropna().values
>>> z
array([[0.040662, 1.041667, 1.      ],
       [0.139757, 1.760417, 2.      ],
       [0.144357, 1.190104, 1.      ],
       [0.145341, 1.047526, 1.      ],
       [0.145401, 1.011882, 1.      ],
       [0.148465, 1.00297 , 1.      ]])

See the docs for the .values method, which just gives you a numpy representation of a dataframe

Sign up to request clarification or add additional context in comments.

3 Comments

This is very clean. But when I try to run z, it says ValueError: all the input arrays must have same number of dimensions
I think you'll have much less of a headache by just using z = df.dropna().values from the start, rather than go through your whole complicated processing.
You are awesome. Even if it doesn't work out for me but I have to mark it an accepted answer. I am trying to reproduce this code and I don't know why It is not working out :( sad. https://medium.com/@jaems33/understanding-kalman-filters-with-python-2310e87b8f48 He has put all the code at the end under Putting it all together and that's why I am trying to reproduce.
0

a I don't think that you even need to do all of those steps. Moving from a pandas data-frame to numpy 2D array is seamless.

df.dropna(inplace=True)
df_numpy = df.drop("t", axis=1).values

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.