saving and reading back numpy array

Question

I have ndarray like this. I am writing it to a dataframe, saving as a pickle, reading that pickle, and then creating new array again. Why does np.array_equal(my_array2,X_train) return false? i tried to debug and have written some code to understand the problem but having a hard time

How should I change the code so that both arrays match?

X_train=array([[" I I want to know how much s it thank you"],
       [" press any key to connect P Thank you Too <unk> I "]],
      dtype='<U97064')
X_train


X_train[0]
#array([[' I I want to know how much s it thank you'],
       [' press any key to connect P Thank you Too <unk> I ']],
      dtype='<U97064')


df = pd.DataFrame(X_train, columns = ['Column_A'])


df.to_pickle('df.pkl')
df2 = pd.read_pickle('df.pkl')

my_array2= df2['Column_A'].to_numpy(dtype='<U97064')

np.array_equal(my_array2[0],X_train[0])
#false

np.array_equal(my_array2,X_train)
#false

type of arrays match

print (type(my_array2))
print (type(X_train))

#<class 'numpy.ndarray'>
#<class 'numpy.ndarray'>

but individual members dont match

#not sure why datatype of individual elements is different
print (type(my_array2[0]))
print (type(X_train[0]))
#<class 'numpy.str_'>
#<class 'numpy.ndarray'>

X_train.dtype
#dtype('<U97064')


type(X_train.dtype)
#numpy.dtype

I don't the pickle has anything to do with it. pandas has changed your array in df. A dataframe is 2d, and a column is 1d. Compare the shape of your X_train and df[column].to_numpy(). You can save numpy arrays without involving pandas. — hpaulj
– hpaulj, Commented Feb 17, 2022 at 21:10

mango · Accepted Answer · 2022-02-17 21:56:41Z

2

In your code, X_train[0] is itself an array while my_array2[0] is a string.

print(X_train[0])
>>array([' I I want to know how much s it thank you'], dtype='<U97064')
print(my_array2[0])
>>' I I want to know how much s it thank you'

If you want my_array2 to be a numpy array of shape (2,1) the shape same as X_train, add .reshape(2,1).

my_array2= df2['Column_A'].to_numpy(dtype='<U97064').reshape(2,1)

print(np.array_equal(my_array2[0],X_train[0]))
>>true

print(np.array_equal(my_array2,X_train))
>>true

answered Feb 17, 2022 at 21:56

mango

16510 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2543622 Over a year ago

do I need to replace 2 in .reshape(2,1) with number of elements in X_train?

mango Over a year ago

print(X_train.shape) will show you the shape of X_train.

Collectives™ on Stack Overflow

saving and reading back numpy array

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related