0

I've been going through an online tutorial

from sklearn.decomposition import  * 
from sklearn import datasets
import matplotlib.pyplot as plt
import time

digits=datasets.load_digits()

randomized_pca = PCA(n_components=2,svd_solver='randomized')

# a numpy array with shape= (1800,2)  
reduced_data_rpca = randomized_pca.fit_transform(digits.data)

# make a scatter plot

colors = ['black', 'blue', 'purple', 'yellow', 'pink', 'red', 'lime', 'cyan', 
'orange', 'gray']

start=time.time()

#   Time Taken for this loop = 9.5 seconds

# for i in range(len(reduced_data_rpca)):
#         x = reduced_data_rpca[i][0]
#         y = reduced_data_rpca[i][1]
#         plt.scatter(x,y,c=colors[digits.target[i]])

# Alternative way  TimeTaken = 0.2 sec

# plots all the points (x,y) with color[i] in ith iteration

for i in range(len(colors)):
    """assigns all the elements (accordingly to x and y)  whose label(0-9) equals the variable i (am I 
    correct ? does this mean it iterates the whole again to check for the 
    equality?) """
    x = reduced_data_rpca[:, 0][digits.target == i]  
    y = reduced_data_rpca[:, 1][digits.target == i]
    plt.scatter(x, y, c=colors[i])

end=time.time()

print("Time taken",end-start," Secs")

My question is although both commented and non-commented loops performs same operation I cannot understand how the second loop is working and why it is performing better than the other one.

1 Answer 1

1

Your first loop (commented out) loops over a 1800-element array. The second one uses the indexing methods of numpy for the "inner loop" and only has to a regular for loop through your 10 colors. Numpy arrays are faster than regular lists and loops.

But what does digits.target == i do? It seems to me like it is not picking out a boolean array from reduced_data_rpca but doing a comparison between a dictionary and the array index over and over. Isn't the result of that comparison always False?

Also see: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html

Sign up to request clarification or add additional context in comments.

2 Comments

actually digits.target (these are the class labels of numbers in reduced_data_rpca) contains 1800 values (ranged 0-9) and reduced_data_rpca contains 2 columns and what I am thinking is that, it loops over this array and verifies the equality of corresponding digits.target value with the variable i. Am I being wrong?
You are right I think! Makes sense once I tried it out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.