0

I have two arrays using the MNIST dataset. First array shape is (60000,28,28) and the second array is (60000,).

Is it possible to combine these and make a new array that is (60000,28,28,1)? I've tried reshaping, resizing, inserting, concatenating and a bunch of other methods to no avail!

Would really appreciate some help! TIA!

2
  • 1
    The new size has the same number of elements as the original. Step back and experiement with much smaller arrays - ones that you actually examine in full. For example, np.arange((24).reshape(2,3,4). What does it mean to a a size (2,) array to that? Commented Jul 27, 2022 at 21:09
  • This was helpful in seeing how they are structured. Things get more complicated when adding another dimension! Thanks! Commented Jul 28, 2022 at 17:17

3 Answers 3

2

It seems like you might have misunderstood how numpy arrays work or how they should be used.

Each dimension(except for the inner most dimension) of a an array is essentially just an array of arrays. So for your example with dimension (60000, 28, 28). You have an array with 60000 arrays, which in turn are arrays with 28 arrays. The final array are then a array of 28 objects of some sort.(Integers in the mnist dataset I think).

You can convert this into a (60000, 28, 28, 1) by using numpys expand_dims method like so:

new_array = numpy.expand_dims(original_array, axis=-1)

However, this will only make the last array be an array of 1 objects, and will not include the other array in any way.

From what I can read from your question it seems like you want to map the labels of the mnist dataset with the corresponding image. You could do this by making the object of the outermost dimension a tuple of(image<28x28 numpy array>, label<int>), but this would remove the numpy functionality of the array. The best course of action is probably to keep it as is and using the index of an image to check the label.

Sign up to request clarification or add additional context in comments.

3 Comments

Yup that's exactly what I'm trying to do (add the image label to the 28x28 corresponding pixel grayscale)! The course I'm following though has all of this data in a Tensorflow tensor. I thought I could create a ndarray then convert to a tensor. Here are some course notes that might help explain: Each observation is 28x28x1 pixels, therefore it is a tensor of rank 3. We must flatten the images using the method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) or (28x28x1,) = (784,) vector.
You can convert numpy arrays to tensors using tensorflows convert_to_tensor, but this won't include the labels in any way. For many small machine learning tasks you can usually just use numpy arrays as arguments to the learning. If you really want to combine the labels and the features together you should try to read up on tensorflow datasets(tfds)
Found it. So you can take two numpy ndarrays of different dimension and combine them into a TensorSliceDateset like so: mnist_train = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).
1

I think this is not possible. To combine any two arrays, they must have the same dimensions. And any two dimensions in each array must be of the same size.

You can imagine (60,000, 28, 28) array as a cube. The surface looking at you has the dimension of 28 x 28. Thus, all same-size surfaces behind it are 60,000 in number. If you want to add a new entity to it, it must have the same 3-D dimension. And at least two dimensions must match those of the first cube. Otherwise, it won't get concatenated exactly.

To combine (60,000, 28, 28) with another array, the second array should have any two of 60,000, 28, 28 as its dimensions. Let's suppose, the second one has (60,000, 28, 14). Then, you can concatenate and get the result:

z = np.concatenate((array1, array2), axis=2)
z.shape

Output:

(60000, 28, 42)

Alternatively, if the second array is (30,000, 28, 28):

z = np.concatenate((array1, array2), axis=0)
z.shape

Output:

(90000, 28, 28)

1 Comment

Thanks for the reply! Appreciate everyone assisting!
0

So you can take two numpy ndarrays of different dimension and combine them into a TensorSliceDateset like so:

mnist_train = tf.data.Dataset.from_tensor_slices((train_images, train_labels))

This was the original intention but I thought it required combining two ND Arrays prior to creating a tensor.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.