0

I am working with the IRIS dataset. I have two sets of data, (1 training set) (2 test set). Now I want to calculate the euclidean distance between every test set row and the train set rows. However, I only want to include the first 4 points of the row.

A working example would be:

dist = np.linalg.norm(inner1test[0][0:4]-inner1train[0][0:4])
print(dist)
***output: 3.034243***

The problem is that I have 120 training set points and 30 test set points - so i would have to do 2700 operations manually, thus I thought about iterating through with a for-loop. Unfortunately, every of my attemps is failing.

This would be my best attempt, which shows the error message

for i in inner1test:
    for number in inner1train: 
        dist = np.linalg.norm(inner1test[i][0:4]-inner1train[number][0:4])
        print(dist)

(IndexError: arrays used as indices must be of integer (or boolean) type)

What would be the best solution to iterate through this array?

ps: I will also provide a screenshot for better vizualisation.

visualization

2 Answers 2

1

From what I see, inner1test is a tuple of lists, so the i value will not be an index but the actual list.

You should use enumerate, which returns two variables, the index and the actual data.

for i, value in enumerate(inner1test):
    for j, number in enumerate(inner1train): 
        dist = np.linalg.norm(inner1test[i][0:4]-inner1train[number][0:4])
        print(dist)

Also, if your lists begin the be bigger, consider using a generator which will execute your calculcations iteration per iteration and return only one value at a time, avoiding to return a big chunk of results which would occupy a lot of memory.

eg:

def my_calculatiuon(inner1test, inner1train):
    for i, value in enumerate(inner1test):
        for j, number in enumerate(inner1train): 
            dist = np.linalg.norm(inner1test[i][0:4]-inner1train[number][0:4])
            yield dist

for i in my_calculatiuon(inner1test, inner1train):
   print(i)
       

You might also want to investigate python list comprehension which is sometimes more elegant way to handle for loops with lists.

[EDIT]

Here's a probably easier solution anyway, without the need of indexes, which won't fail to enumerate a numpy object:

for testvalue in inner1test:
    for testtrain in inner1train:
        dist = np.linalg.norm(testvalue[0:4]-testtrain[0:4])

[/EDIT]

Sign up to request clarification or add additional context in comments.

4 Comments

Unfortunately, it still shows the same error, even if I enumerate it. The inner1test in a numpy.ndarray. I saw a post where they elaborated on that error, I think it might be true for this specific case- but i am not sure how to implement it. stackoverflow.com/questions/17393989/… Do you think this might be the solution?
Just added an easier example which does not need any indices
Thanks. However, the edit is not really iterating over the two arrays. It must iterate, as the output should be the full array of 2700 distances. Or am I mistaken and did not implement your solution the correct way?
Hmmm... You should write a for i in inner1test: print(i) and the same for inner1train to make sure your variables are iterable. If not, post the results please.
1

This was the final solution with the correct output for me:

distanceslist = list()

for testvalue in inner1test:
    for testtrain in inner1train:
        dist = np.linalg.norm(testvalue[0:4]-testtrain[0:4])
        distances = (dist, testtrain[0:4])
        distanceslist.append(distances)
        
distanceslist

1 Comment

Happy to know you sorted it out

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.