0

I want to get a histogram of the raw digits data in 784 dimensions. Here is my code:

import sys,os
from math import *
import random
from numpy import *
import matplotlib.pyplot as plt
import datasets

waitForEnter=False
def exampleDistance(x1, x2):
    dist = 0.
    for i,v1 in x1.iteritems():
        v2 = 0.
        if x2.has_key(i): v2 = x2[i]
        dist += (v1 - v2) * (v1 - v2)
    for i,v2 in x2.iteritems():
        if not x1.has_key(i):
            dist += v2 * v2
    return sqrt(dist)

def computeDistances(data):
    #N = len(data)
    #D = len(data[0])
    N, D = data.shape
    dist = []
    for n in range(N):
        for m in range(n):
            dist.append( exampleDistance(data[n],data[m])  / sqrt(D))
    return dist
Dims = [784]
#Cols = ['#FF0000', '#880000', '#000000', '#000088', '#0000FF']
Cols = ['#FF0000']
Bins = arange(0, 1, 0.02)


plt.xlabel('distance / sqrt(dimensionality)')
plt.ylabel('# of pairs of points at that distance')
#plt.title('dimensionality versus uniform point distances')
plt.title('dimensionality versus digits data point distances')

for i,d in enumerate(Dims):
    distances = computeDistances(datasets.DigitData.X)
    print "D=%d, average distance=%g" % (d, mean(distances) * sqrt(d))
    plt.hist(distances,
             Bins,
             histtype='step',
             color=Cols[i])
    if waitForEnter:
        plt.legend(['%d dims' % d for d in Dims])
        plt.show(False)
        x = raw_input('Press enter to continue...')


plt.legend(['%d dims' % d for d in Dims])
plt.savefig('fig.pdf')
plt.show()

But there is something wrong:

Traceback (most recent call last):
  File "HW3.py", line 56, in <module>
    distances = computeDistances(datasets.DigitData.X)
  File "HW3.py", line 39, in computeDistances
    dist.append( exampleDistance(data[n],data[m])  / sqrt(D))
  File "HW3.py", line 23, in exampleDistance
    for i,v1 in x1.iteritems():
AttributeError: 'numpy.ndarray' object has no attribute 'iteritems'

Besides, here is the digit datasets:

class DigitData:
    Xall,Yall = loadDigitData('data/1vs2.all')
    N,D = Xall.shape
    N0 = int(float(N) * 0.5)
    X = Xall[0:N0,:]
    Y = Yall[0:N0]
    Xte = Xall[N0:,:]
    Yte = Yall[N0:]

Then how can I fix it? As a python beginner, I'm very confused of plotting.

1 Answer 1

0

I would recommend reading this earlier post and if you still did not find your answer comment on this. I hope that will help.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for answering. I know iteritems() is used for dictionary not array.However, the function def exampleDistance(x1, x2): is given by Prof. She let us to use this function to calculate distance. I think I have to minimize the changing in exampleDistance(x1, x2), but I don't know how to do it?
can you try changing iteritems with items in your function, change it to for i,v1 in x1.items() ?
Assuming you have a dictionary with keys,, try for i, v1 in enumerate(x1.iteritems(),1): . This should do the trick. Here 1 is setting the index to start from 1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.