How to convert List to numpy Array

Question

Here's the link to collab https://colab.research.google.com/drive/1wftAvDu_Wu2Y9ahgI1Z1FLciUH5MnSJ9

train_labels = ['GovernmentSchemes', 'GovernmentSchemes', 'GovernmentSchemes', 'GovernmentSchemes', 'CropInsurance']

training_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels))

output coming :

[list([3]) list([3]) list([3]) ... list([2]) list([5]) list([1])]

expected output :

[[3] [3] [3] .. [2] [5]...]

num_epochs = 30
history = model.fit(train_padded, training_label_seq, epochs=num_epochs, validation_data=(validation_padded, validation_label_seq))

Error => ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list)

what is the logic between the input and output? and how can label_tokenizer.texts_to_sequences be reproduced ? — fireball.1
– fireball.1, Commented May 30, 2020 at 12:52
Yes, this code is a bit incomplete. label_tokenizer - is this from TensorFlow? If so, this should have been included. The code, as it stands, is a snippet, and can't be run. Posting a minimal reproducible example is important. — asylumax
– asylumax, Commented May 30, 2020 at 12:58
Output after using -- np.array([[x] for x in training_label_seq]) [list([3])] [list([3])] [list([3])]] — Anirudh_k07
– Anirudh_k07, Commented May 30, 2020 at 13:16

score 1 · Accepted Answer · 2020-06-02 14:53:51Z

1

I was able to recreate your issue using the below code -

Code to recreate the issue -

import numpy as np
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.preprocessing.text import Tokenizer

label_tokenizer = Tokenizer()

# Fit on a text 
fit_text = "Tensorflow warriors are awesome people"
label_tokenizer.fit_on_texts(fit_text)

# Training Labels
train_labels = "Tensorflow warriors are great people"
training_label_list = np.array(label_tokenizer.texts_to_sequences(train_labels))

# Print the 
print(training_label_list)
print(type(training_label_list))
print(type(training_label_list[0]))

Output -

2.2.0
[list([9]) list([1]) list([10]) list([5]) list([3]) list([2]) list([11])
 list([7]) list([3]) list([6]) list([]) list([6]) list([4]) list([2])
 list([2]) list([12]) list([3]) list([2]) list([5]) list([]) list([4])
 list([2]) list([1]) list([]) list([4]) list([2]) list([1]) list([])
 list([]) list([2]) list([1]) list([4]) list([9]) list([]) list([8])
 list([1]) list([3]) list([8]) list([7]) list([1])]
<class 'numpy.ndarray'>
<class 'list'>

Solution -

Replacing np.array with np.hstack will fix your problem. Your model.fit() should work fine now.
Else if you are looking for the expected output as in your question, training_label_list = label_tokenizer.texts_to_sequences(train_labels) will give you a list of list. You can use np.array([np.array(i) for i in training_label_list]) to convert to array of array. This works only if your list of lists contains lists with same number of elements.

np.hstack Code - Code for Point number 1 in solution.

import numpy as np
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.preprocessing.text import Tokenizer

label_tokenizer = Tokenizer()

# Fit on a text 
fit_text = "Tensorflow warriors are awesome people"
label_tokenizer.fit_on_texts(fit_text)

# Training Labels
train_labels = "Tensorflow warriors are great people"
training_label_list = np.hstack(label_tokenizer.texts_to_sequences(train_labels))

# Print the 
print(training_label_list)
print(type(training_label_list))
print(type(training_label_list[0]))

Output -

2.2.0
[ 9.  1. 10.  4.  2.  3. 11.  7.  2.  5.  5.  6.  3.  3. 12.  2.  3.  4.
  6.  3.  1.  3.  1.  6.  9.  8.  1.  2.  8.  7.  1.]
<class 'numpy.ndarray'>
<class 'numpy.float64'>

Expected output as in question - Code for Point number 2 in solution.

import numpy as np
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.preprocessing.text import Tokenizer

label_tokenizer = Tokenizer()

# Fit on a text 
fit_text = "Tensorflow warriors are awesome people"
label_tokenizer.fit_on_texts(fit_text)

# Training Labels
train_labels = "Tensorflow warriors are great people"
training_label_list = label_tokenizer.texts_to_sequences(train_labels)

# Print 
print(training_label_list)
print(type(training_label_list))
print(type(training_label_list[0]))

# To convert elements to array
training_label_list = np.array([np.array(i) for i in training_label_list])

# Print
print(training_label_list)
print(type(training_label_list))
print(type(training_label_list[0]))

Output -

2.2.0
[[9], [1], [10], [4], [2], [3], [11], [7], [2], [5], [], [5], [6], [3], [3], [12], [2], [3], [4], [], [6], [3], [1], [], [], [3], [1], [6], [9], [], [8], [1], [2], [8], [7], [1]]
<class 'list'>
<class 'list'>
[array([9]) array([1]) array([10]) array([4]) array([2]) array([3])
 array([11]) array([7]) array([2]) array([5]) array([], dtype=float64)
 array([5]) array([6]) array([3]) array([3]) array([12]) array([2])
 array([3]) array([4]) array([], dtype=float64) array([6]) array([3])
 array([1]) array([], dtype=float64) array([], dtype=float64) array([3])
 array([1]) array([6]) array([9]) array([], dtype=float64) array([8])
 array([1]) array([2]) array([8]) array([7]) array([1])]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>

Hope this answers your question. Happy Learning.

Update 2/6/2020 - Anirudh_k07, As per our discussion, I had a look into your program and you are getting below error in model.fit() after using np.hstack for labels.

ValueError: Data cardinality is ambiguous:
  x sizes: 41063
  y sizes: 41429
Please provide data which shares the same first dimension.

This error you are getting is because few of the labels have special characters like - and /. Thus on performing np.hstack(label_tokenizer.texts_to_sequences(train_labels), they are creating additional rows. You can print list of unique train_labels by using print(set(train_labels)).

Here is gist of what I am trying to say -

# These Labels have special character
train_labels = ['Bio-PesticidesandBio-Fertilizers','Old/SenileOrchardRejuvenation']
training_label_seq = np.hstack(label_tokenizer.texts_to_sequences(train_labels))
print("Two labels are converted to Five :",training_label_seq)

# These Labels are fine
train_labels = ['SoilHealthCard', 'PostHarvestPreservation', 'FertilizerUseandAvailability']
training_label_seq = np.hstack(label_tokenizer.texts_to_sequences(train_labels))
print("Three labels are remain three :",training_label_seq)

Output -

Two labels are converted to Five : [17 18 19 51 52]
Three labels are remain three : [20 36  5]

So kindly do the proper preprocessing and eliminate these special characters in train_labels and then use np.hstack(label_tokenizer.texts_to_sequences(train_labels)) on labels. Your model.fit() should work fine after that.

Hope this answers your question. Happy Learning.

edited Jun 2, 2020 at 14:53

answered Jun 1, 2020 at 7:29

user11530462

Sign up to request clarification or add additional context in comments.

8 Comments

user11530462 Over a year ago

@Anirudh_k07 - Does this answer your question?

Anirudh_k07 Over a year ago

Using 2nd method => Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

Anirudh_k07 Over a year ago

Using Method 1 my shape is changing & the dimensions are no longer matching

user11530462 Over a year ago

As we mentioned in the answer, Method 1 is the proper way to use in model.fit(). Method 2 is just mentioned as you stated the expected output in your question. Input shape is altogether a different problem that depends on shape of your input data and input shape mentioned in first layer. Do share those information, so that we can help.

user11530462 Over a year ago

Are you doing pad_sequences after Tokenizer to pad sequences to the same length in input? Would recommend you to look into this link - charon.me/posts/tf/tf3 to understand better the Tokenization and Text Data Preparation for model.fit().

|

Collectives™ on Stack Overflow

How to convert List to numpy Array

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related