3

This is what I have thus far:

Stats2003 = np.loadtxt('/DataFiles/2003.txt') 
Stats2004 = np.loadtxt('/DataFiles/2004.txt') 
Stats2005 = np.loadtxt('/DataFiles/2005.txt') 
Stats2006 = np.loadtxt('/DataFiles/2006.txt')
Stats2007 = np.loadtxt('/DataFiles/2007.txt') 
Stats2008 = np.loadtxt('/DataFiles/2008.txt')
Stats2009 = np.loadtxt('/DataFiles/2009.txt') 
Stats2010 = np.loadtxt('/DataFiles/2010.txt') 
Stats2011 = np.loadtxt('/DataFiles/2011.txt') 
Stats2012 = np.loadtxt('/DataFiles/2012.txt') 

Stats = Stats2003, Stats2004, Stats2004, Stats2005, Stats2006, Stats2007, Stats2008, Stats2009, Stats2010, Stats2011, Stats2012

I am trying to calculate euclidean distance between each of these arrays with every other array but am having difficulty doing so.

I have the output I would like by calculating the distance like:

dist1 = np.linalg.norm(Stats2003-Stats2004)
dist2 = np.linalg.norm(Stats2003-Stats2005)
dist11 = np.linalg.norm(Stats2004-Stats2005)

etc but I would like to make these calculations with a loop.

I am displaying the calculations into a table using Prettytable.

Can anyone point me in the right direction? I haven't found any previous solutions that have worked.

2
  • What's the final table look like? Commented Feb 18, 2013 at 1:38
  • The table has column headers for each year (2003,2004,etc - 2012) and row headers the same way (2003,2004, etc - 2012). Each tuple then shows the euclidean distance between a given year and any other year. cl.ly/image/07120Z171z1z Commented Feb 18, 2013 at 1:43

2 Answers 2

2

Look at scipy.spatial.distance.cdist.

From the documentation:

Computes distance between each pair of the two collections of inputs.

So you could do something like the following:

import numpy as np
from scipy.spatial.distance import cdist
# start year to stop year
years = range(2003,2013)
# this will yield an n_years X n_features array
features = np.array([np.loadtxt('/Datafiles/%s.txt' % year) for year in years])
# compute the euclidean distance from each year to every other year
distance_matrix = cdist(features,features,metric = 'euclidean')

If you know the start year, and you aren't missing data for any years, then it's easy to determine which two years are being compared at coordinate (m,n) in the distance matrix.

Sign up to request clarification or add additional context in comments.

Comments

2

To do the loop you will need to keep data out of your variable names. A simple solution would be to use dictionaries instead. The loops are implicit in the dict comprehensions:

import itertools as it

years = range(2003, 2013)
stats = {y: np.loadtxt('/DataFiles/{}.txt'.format(y) for y in years}
dists = {(y1,y2): np.linalg.norm(stats[y1] - stats[y2]) for (y1, y2) in it.combinations(years, 2)}

now access stats for a particular year, e.g. 2007, by stats[2007] and distances with tuples e.g. dists[(2007, 20011)].

1 Comment

I made a typo in the norm, fixed now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.