looping through an array to find euclidean distance in python

Question

This is what I have thus far:

Stats2003 = np.loadtxt('/DataFiles/2003.txt') 
Stats2004 = np.loadtxt('/DataFiles/2004.txt') 
Stats2005 = np.loadtxt('/DataFiles/2005.txt') 
Stats2006 = np.loadtxt('/DataFiles/2006.txt')
Stats2007 = np.loadtxt('/DataFiles/2007.txt') 
Stats2008 = np.loadtxt('/DataFiles/2008.txt')
Stats2009 = np.loadtxt('/DataFiles/2009.txt') 
Stats2010 = np.loadtxt('/DataFiles/2010.txt') 
Stats2011 = np.loadtxt('/DataFiles/2011.txt') 
Stats2012 = np.loadtxt('/DataFiles/2012.txt') 

Stats = Stats2003, Stats2004, Stats2004, Stats2005, Stats2006, Stats2007, Stats2008, Stats2009, Stats2010, Stats2011, Stats2012

I am trying to calculate euclidean distance between each of these arrays with every other array but am having difficulty doing so.

I have the output I would like by calculating the distance like:

dist1 = np.linalg.norm(Stats2003-Stats2004)
dist2 = np.linalg.norm(Stats2003-Stats2005)
dist11 = np.linalg.norm(Stats2004-Stats2005)

etc but I would like to make these calculations with a loop.

I am displaying the calculations into a table using Prettytable.

Can anyone point me in the right direction? I haven't found any previous solutions that have worked.

The table has column headers for each year (2003,2004,etc - 2012) and row headers the same way (2003,2004, etc - 2012). Each tuple then shows the euclidean distance between a given year and any other year. cl.ly/image/07120Z171z1z — Cetus
– Cetus, Commented Feb 18, 2013 at 1:43

John Vinyard · Accepted Answer · 2013-02-18 02:11:46Z

2

Look at scipy.spatial.distance.cdist.

From the documentation:

Computes distance between each pair of the two collections of inputs.

So you could do something like the following:

import numpy as np
from scipy.spatial.distance import cdist
# start year to stop year
years = range(2003,2013)
# this will yield an n_years X n_features array
features = np.array([np.loadtxt('/Datafiles/%s.txt' % year) for year in years])
# compute the euclidean distance from each year to every other year
distance_matrix = cdist(features,features,metric = 'euclidean')

If you know the start year, and you aren't missing data for any years, then it's easy to determine which two years are being compared at coordinate (m,n) in the distance matrix.

edited Feb 18, 2013 at 2:11

answered Feb 18, 2013 at 1:47

John Vinyard

13.6k3 gold badges34 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

wim · Accepted Answer · 2013-02-18 02:59:10Z

2

To do the loop you will need to keep data out of your variable names. A simple solution would be to use dictionaries instead. The loops are implicit in the dict comprehensions:

import itertools as it

years = range(2003, 2013)
stats = {y: np.loadtxt('/DataFiles/{}.txt'.format(y) for y in years}
dists = {(y1,y2): np.linalg.norm(stats[y1] - stats[y2]) for (y1, y2) in it.combinations(years, 2)}

now access stats for a particular year, e.g. 2007, by stats[2007] and distances with tuples e.g. dists[(2007, 20011)].

edited Feb 18, 2013 at 2:59

answered Feb 18, 2013 at 1:49

wim

368k114 gold badges681 silver badges817 bronze badges

1 Comment

wim Over a year ago

I made a typo in the norm, fixed now.

Collectives™ on Stack Overflow

looping through an array to find euclidean distance in python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related