Numpy array format for fit_transform(image) TSNE method

Question

I'm using Scikit-learn for tSNE to interrogate around 1000 scatterplots, but I appear to require a 2D numpy array to access the fit_transform method. I'm new to Python.

My code,

from sklearn.manifold import TSNE
import numpy as np
import cv2
mypath='/Path/to/files/scatterplots/'
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
photos = np.empty(len(onlyfiles), dtype=object)
for n in range(0, len(onlyfiles)):
  photos[n] = cv2.imread( join(mypath,onlyfiles[n]) )

fig, axes = plt.subplots(2, 2, figsize=(10,10), subplot_kw={'xticks':(), 'yticks':()})
for ax, img in zip(axes.ravel(), photos):
   ax.imshow(img)

output

Problem code

tsne = TSNE(random_state=50)
digits_tsne = tsne.fit_transform (photos.data)

Error

ValueError Traceback (most recent call last) in

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/manifold/t_sne.py in fit_transform(self, X, y)

892             Embedding of the training data in low-dimensional space.
893         """

--> 894 embedding = self._fit(X)

3 further lines of error output within t_sne.py

I believe the fit_transform method requires a 2D numpy array, e.g.

'target': array([0, 1, 2, 3])

where 0-3 refer to the different data (parasites) behind each of the scatterplots 1-4.

Request How do I combine the target array into image numpy array so fit_transform can see it and process it?

Thanks, print (photos.shape) .... output (4,). Just to explain, this is just an example data set. — M__
– M__, Commented Feb 6, 2019 at 20:56
In order to fit the algorithm you will need to feed it with data. In your case, the pixel values of each image. Try to run fit_transform() with a single image. I'm guessing that that image should have a 2D shape. — pazitos10
– pazitos10, Commented Feb 6, 2019 at 21:00
Thanks, I just tried and get the same error .. yes the image has a 2D shape. — M__
– M__, Commented Feb 6, 2019 at 21:08

Sergey Bushmanov · Accepted Answer · 2019-02-07 09:52:15Z

2

Please check documentation for t-SNE:

X : array, shape (n_samples, n_features)

For your case to work, you need to cast images to 1d array and assemble a matrix out of them.

Codewise, the following snippet should do the job of 2-dimensional t-SNE clustering:

arr = [cv2.imread( join(mypath,onlyfiles[n])).ravel() for n in range(0, len(onlyfiles))]
X = np.vstack[arr]
tsne = TSNE(n_components=2).fit_transform(X)

edited Feb 7, 2019 at 9:52

answered Feb 7, 2019 at 5:46

Sergey Bushmanov

25.5k8 gold badges63 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

M__ Over a year ago

Thank you, this is pretty exciting ... I'll feed back here shortly.

M__ Over a year ago

Thank you again. The only difference is "X = np.vstack(arr)". The associated manuscript will take around 1 year for the paper to be published, but I will certainly acknowledge @SergeyBushmanov as an important source of help.

Collectives™ on Stack Overflow

Numpy array format for fit_transform(image) TSNE method

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related