5

I am doing a CNN project, and I need to preprocess the label first.

The image file is a spectrogram, each file has a label of 250 values stored in an array. It tells a sequence of pitch values present in a particular spectrogram. For example, one label file looks like this:

[ 0  0  0  0  0  0  0  0  0  0  0 57 57 57 57 57 57 57 57 58 58 57 57 57
  0  0  0  0  0 56 57 57 56 56 56 56 56 56 56 56 56 57 57 58 59 61 62 62
 63 64 64 63 64 64 64 64  0  0  0  0 64 64 64 64 63 63 63 63 63 64 63 64
 64 64 65 66 66 66 66 66 65 65 66 66 66 66 65  0  0  0  0 65 65 65 66 66
 66 66 66 65 65 65  0  0  0  0 64 64 64 64 64 64 64 64 64 64 64 64 64 64
 63  0  0  0  0  0  0  0  0  0  0  0  0  0 60 60 60 60 61 61 62 62 62 62
 62 62 62 61  0  0  0 62 62 62 62 62 62 62 62 62 62 62 62 60  0 62 61 60
 61 61 61 61 61 61 61 61 61 60  0  0  0  0  0 61 60 60 60 61 61 61 61 61
 61  0  0  0  0  0  0 59 59 59 59 58 58 59 59 59 59  0  0  0  0  0  0  0
 59 59 58 58 59 59 59 59 59 59  0  0  0  0 58 57 57 57 57 57 57 57 57 57
 57 57 58 57  0  0  0  0  0  0]

After I summarize all label files, I have found these 51 unique values present in those labels. I stored these values in an array.

y_train = # y_test also contains these values
[ 0 30 31 32 33 34 35 36 37 38 
 39 40 41 42 43 44 45 46 47 48 
 49 50 51 52 53 54 55 56 57 58 
 59 60 61 62 63 64 65 66 67 68 
 69 70 71 72 73 74 76 77 81 83 
 85]

I need to execute to_categorical method to determine the class number (in my case, 51) before I can do CNN computation. You can see to_categorical docs here.

I have done it, but the result is 86, not 51. I assume because my label is already in an integer format, and the method thinks that I have 86 unique values ranging from 0-85 in a complete order, while in reality I have only 51 unique values, ranging from 0-85, but not in complete order (see y_train).

# convert to array first. y_train and y_test are labels for an image X_train and X_test.
y_train = np.array(y_train) # labels for X_train images
y_test = np.array(y_test) # labels for X_test images

# do to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# shape result
y_train:  (638, 250, 86) # 638 = total data, 250 = 1 data length, 86 = num_class
y_test:  (161, 250, 86) # 161 = total data, 250 = 1 data length, 86 = num_class

Then, I come up to an idea to map all unique values into a new integer to make to_categorical method thinks I have only 51 class, example:

0 -> 0
30 -> 1
31 -> 2
32 -> 3
...
85 -> 51

Is there a way in Python to achieve that kind of mapping from y_train array? And if there is, can I return it back to its original value when the computation is finished? Thank you.

1 Answer 1

4

Yes, you can make a dictionary of all those mappings like below

map_dict = {}

for i, value in enumerate(y_train):
    map_dict[i] = value

Your new categories would be the keys of map_dict, that you can get like below

list(map_dict.keys())

Later on whenever you have to look back to the original values, you just need to check in the map_dict like

 map_dict[k]

For printing both the keys and value in the dictionary, do the following,

 for key, value in map_dict.items():
     print(key, ' --->', value)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! 1 question, can I also print the dict with its key and value?
@DionisiusPratama Yes sure, I will edit my answer to show you how to print

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.