0

I'm using Scikit-learn for tSNE to interrogate around 1000 scatterplots, but I appear to require a 2D numpy array to access the fit_transform method. I'm new to Python.

My code,

from sklearn.manifold import TSNE
import numpy as np
import cv2
mypath='/Path/to/files/scatterplots/'
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
photos = np.empty(len(onlyfiles), dtype=object)
for n in range(0, len(onlyfiles)):
  photos[n] = cv2.imread( join(mypath,onlyfiles[n]) )

fig, axes = plt.subplots(2, 2, figsize=(10,10), subplot_kw={'xticks':(), 'yticks':()})
for ax, img in zip(axes.ravel(), photos):
   ax.imshow(img)

output enter image description here

Problem code

tsne = TSNE(random_state=50)
digits_tsne = tsne.fit_transform (photos.data)

Error

ValueError Traceback (most recent call last) in

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/manifold/t_sne.py in fit_transform(self, X, y)

892             Embedding of the training data in low-dimensional space.
893         """

--> 894 embedding = self._fit(X)

  • 3 further lines of error output within t_sne.py

I believe the fit_transform method requires a 2D numpy array, e.g.

'target': array([0, 1, 2, 3])

where 0-3 refer to the different data (parasites) behind each of the scatterplots 1-4.

Request How do I combine the target array into image numpy array so fit_transform can see it and process it?

4
  • 1
    What's the shape of photos.data? Commented Feb 6, 2019 at 20:22
  • Thanks, print (photos.shape) .... output (4,). Just to explain, this is just an example data set. Commented Feb 6, 2019 at 20:56
  • 1
    In order to fit the algorithm you will need to feed it with data. In your case, the pixel values of each image. Try to run fit_transform() with a single image. I'm guessing that that image should have a 2D shape. Commented Feb 6, 2019 at 21:00
  • Thanks, I just tried and get the same error .. yes the image has a 2D shape. Commented Feb 6, 2019 at 21:08

1 Answer 1

2

Please check documentation for t-SNE:

X : array, shape (n_samples, n_features)

For your case to work, you need to cast images to 1d array and assemble a matrix out of them.

Codewise, the following snippet should do the job of 2-dimensional t-SNE clustering:

arr = [cv2.imread( join(mypath,onlyfiles[n])).ravel() for n in range(0, len(onlyfiles))]
X = np.vstack[arr]
tsne = TSNE(n_components=2).fit_transform(X)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, this is pretty exciting ... I'll feed back here shortly.
Thank you again. The only difference is "X = np.vstack(arr)". The associated manuscript will take around 1 year for the paper to be published, but I will certainly acknowledge @SergeyBushmanov as an important source of help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.