Adding an additional dimension to ndarray

Question

I have and ndarray defined in the following way:

dataset = np.ndarray(shape=(len(image_files), image_size, image_size),
                         dtype=np.float32)

This array represents a collection of images of size image_size * image_size. So I can say, dataset[0] and get a 2D table corresponding to an image with index 0.

Now I would like to have one additional field for each image in this array. For instance, for image located at index 0, I would like to store number 123, for an image located at index 321 I would like to store number 50000.

What is the simplest way to add this additional data field to the existing ndarray? What is the appropriate way to access data in the new array after adding this additional dimension?

I don't think adding a 4th dimension to your array is the right approach. Perhaps a dictionary is a better approach? The images could be one value, and the number could be a second value, each stored with different keys. — Davis
– Davis, Commented Feb 27, 2021 at 0:40
Do you mean an additional python dictionary that would have a key of image_data and a value would be that number that I want to add ? The problem with this is that in the existing code dataset mentioned above is already used. That is way it would be problematic to change the data structure. — sergejsrk
– sergejsrk, Commented Feb 27, 2021 at 0:57
For instance, in the existing code, I use np.random.shuffle on that array. If some info will be in a map, a correspondence between image and its property will be lost — sergejsrk
– sergejsrk, Commented Feb 27, 2021 at 1:07
I see your dilemma. I suggested this approach because, as @Bobby Ocean alludes, if you were to add an extra dimension to your array, it would make the data exponentially bigger. A 100x100X1 (10,000 element) array would turn into 100x100x2 (20,000 element) array to store 1 extra number. Now expand this into 4 dimensions with larger images... — Davis
– Davis, Commented Feb 27, 2021 at 1:43

hpaulj · Accepted Answer · 2021-02-27 04:36:22Z

1

If you shuffle an index array instead of the dataset itself, you can keep track of the original 'identifiers'

idx = np.arange(len(image_files))
np.random.shuffle(idx)
shuffle_set = dataset[idx]

illustration:

In [20]: x = np.arange(12).reshape(6,2)
    ...: idx = np.arange(6)
    ...: np.random.shuffle(idx) 
In [21]: x
Out[21]: 
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
In [22]: x[idx]             # shuffled
Out[22]: 
array([[ 4,  5],
       [ 0,  1],
       [ 2,  3],
       [ 6,  7],
       [10, 11],
       [ 8,  9]])
In [23]: idx1=np.argsort(idx)
In [24]: idx
Out[24]: array([2, 0, 1, 3, 5, 4])
In [25]: idx1
Out[25]: array([1, 2, 0, 3, 5, 4])
In [26]: Out[22][idx1]       # recover original order
Out[26]: 
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

edited Feb 27, 2021 at 4:36

answered Feb 27, 2021 at 2:08

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sergejsrk Over a year ago

Thanks, @hpaulj. That is exactly what I needed. That solved my problem.

Anonymous · Accepted Answer · 2021-02-27 11:15:23Z

Numpy arrays are fundamentally tensors, i.e., they have a shape that is absolute across the axes. Meaning that the shape is fixed and not variable. Take for example,

import numpy as np

x = np.array([[[1,2],[3,4]],
              [[5,6],[7,8]]
             ])
print(x.shape) #Here we have two, 2x2s. Shape = (2,2,2)

If I want to associate x[0] to the number 5 and x[1] to the number 7, then that would be something like (if it was possible):

x = np.array([[[1,2],[3,4]],5,
              [[5,6],[7,8]],7
             ])

But such thing is impossible, since it would "in some sense" have a shape that corresponds to (2,((2,2),1)), or something else that is ambiguous. Such an object is not a numpy array or a tensor. It doesn't have fixed axis sizes. All numpy arrays must have fixed axis sizes. Hence, if you wish to store the new information, the only way to do it, is to create another array.

x = np.array([[[1,2],[3,4]],
              [[5,6],[7,8]],
             ])
y = np.array([5,7])

Now x[0] corresponds to y[0] and x[1] corresponds to y[1]. x has shape (2,2,2) and y has shape (2,).

Collectives™ on Stack Overflow

Adding an additional dimension to ndarray

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related