2

I have ndarray like this. I am writing it to a dataframe, saving as a pickle, reading that pickle, and then creating new array again. Why does np.array_equal(my_array2,X_train) return false? i tried to debug and have written some code to understand the problem but having a hard time

How should I change the code so that both arrays match?

X_train=array([[" I I want to know how much s it thank you"],
       [" press any key to connect P Thank you Too <unk> I "]],
      dtype='<U97064')
X_train


X_train[0]
#array([[' I I want to know how much s it thank you'],
       [' press any key to connect P Thank you Too <unk> I ']],
      dtype='<U97064')


df = pd.DataFrame(X_train, columns = ['Column_A'])


df.to_pickle('df.pkl')
df2 = pd.read_pickle('df.pkl')

my_array2= df2['Column_A'].to_numpy(dtype='<U97064')

np.array_equal(my_array2[0],X_train[0])
#false

np.array_equal(my_array2,X_train)
#false 

type of arrays match

print (type(my_array2))
print (type(X_train))

#<class 'numpy.ndarray'>
#<class 'numpy.ndarray'>

but individual members dont match

#not sure why datatype of individual elements is different
print (type(my_array2[0]))
print (type(X_train[0]))
#<class 'numpy.str_'>
#<class 'numpy.ndarray'>

X_train.dtype
#dtype('<U97064')


type(X_train.dtype)
#numpy.dtype
2
  • I don't the pickle has anything to do with it. pandas has changed your array in df. A dataframe is 2d, and a column is 1d. Compare the shape of your X_train and df[column].to_numpy(). You can save numpy arrays without involving pandas. Commented Feb 17, 2022 at 21:10
  • could you show the complete code? Commented Feb 17, 2022 at 21:16

1 Answer 1

2

In your code, X_train[0] is itself an array while my_array2[0] is a string.

print(X_train[0])
>>array([' I I want to know how much s it thank you'], dtype='<U97064')
print(my_array2[0])
>>' I I want to know how much s it thank you'

If you want my_array2 to be a numpy array of shape (2,1) the shape same as X_train, add .reshape(2,1).

my_array2= df2['Column_A'].to_numpy(dtype='<U97064').reshape(2,1)

print(np.array_equal(my_array2[0],X_train[0]))
>>true

print(np.array_equal(my_array2,X_train))
>>true
Sign up to request clarification or add additional context in comments.

2 Comments

do I need to replace 2 in .reshape(2,1) with number of elements in X_train?
print(X_train.shape) will show you the shape of X_train.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.