Refactoring a small Scala function

Question

I have this function to compute the distance between two n-dimensional points using Pythagoras' theorem.

def computeDistance(neighbour: Point) = math.sqrt(coordinates.zip(neighbour.coordinates).map {
    case (c1: Int, c2: Int) => math.pow(c1 - c2, 2)
}.sum)

The Point class (simplified) looks like:

class Point(val coordinates: List[Int])

I'm struggling to refactor the method so it's a little easier to read, can anybody help please?

I think it's not worth it. You could create implicit classes and implement map and flatMap in your Point class but, in my opinion, it's too much work for too little gain. Now it's clean implementation of mathematical equation. — goral
– goral, Commented Aug 23, 2014 at 19:34
Your formula for N-dimensional Euclidean distance is wrong. See @penland365's answer (or that link) for the correct formula. — DaoWen
– DaoWen, Commented Aug 23, 2014 at 22:11
I agree, I only tested it in two dimensions, which is why I didn't realise. Cheers! — Todd Davies
– Todd Davies, Commented Aug 24, 2014 at 7:52

samthebest · Accepted Answer · 2014-08-24 11:11:12Z

3

Here's another way that makes the following three assumptions:

The length of the list is the number of dimensions for the point
Each List is correctly ordered, i.e. List(x, y) or List(x, y, z). We do not know how to handle List(x, z, y)
All lists are of equal length

def computeDistance(other: Point): Double = sqrt(
  coordinates.zip(other.coordinates)
  .flatMap(i => List(pow(i._2 - i._1, 2)))
  .fold(0.0)(_ + _)
)

The obvious disadvantage here is that we don't have any safety around list length. The quick fix for this is to simply have the function return an Option[Double] like so:

def computeDistance(other: Point): Option[Double] = {
  if(other.coordinates.length != coordinates.length) {
    return None
  }
  return Some(sqrt(coordinates.zip(other.coordinates)
    .flatMap(i => List(pow(i._2 - i._1, 2)))
    .fold(0.0)(_ + _)
   ))

I'd be curious if there is a type safe way to ensure equal list length.

EDIT

It was politely pointed out to me that flatMap(x => List(foo(x))) is equivalent to map(foo) , which I forgot to refactor when I was originally playing w/ this. Slightly cleaner version w/ Map instead of flatMap :

def computeDistance(other: Point): Double = sqrt(
  coordinates.zip(other.coordinates)
  .map(i => pow(i._2 - i._1, 2))
  .fold(0.0)(_ + _)
)

edited Aug 24, 2014 at 11:11

samthebest

31.7k25 gold badges106 silver badges153 bronze badges

answered Aug 23, 2014 at 20:40

penland365

5554 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Todd Davies Over a year ago

I use the \/ class in scalaz to try and handle lists with non-uniform lengths. Take a look: github.com/Todd-Davies/NearestNeighbour

DaoWen Over a year ago

+1 for using the correct formula for Euclidean Distance (the OP got it wrong in his post—you always square the individual deltas no matter the number of dimensions).

The Archetypal Paul Over a year ago

Isn't .fold(0.0){(z, j) => z + j} the same as .sum? Or .reduce(_ + _) if you prefer

sscarduzio Over a year ago

@Paul Your objection is valid, however the fold is preferred because it looks like a big eyed guy staring at a butt: fold(0.0)(_+_)

The Archetypal Paul Over a year ago

@sscarduzio, Ah. Must have missed that bit of guidance in the style guide.

Rex Kerr · Accepted Answer · 2014-08-23 19:39:40Z

1

Most of your problem is that you're trying to do math with really long variable names. It's almost always painful. There's a reason why mathematicians use single letters. And assign temporary variables.

Try this:

class Point(val coordinates: List[Int]) { def c = coordinates }
import math._
def d(p: Point) = {
  val delta = for ((a,b) <- (c zip p.c)) yield pow(a-b, dims)
  sqrt(delta.sum)
}

answered Aug 23, 2014 at 19:39

Rex Kerr

168k27 gold badges325 silver badges411 bronze badges

1 Comment

Todd Davies Over a year ago

I like the for comprehension. Thanks.

elm · Accepted Answer · 2014-08-25 11:49:59Z

Consider type aliases and case classes, like this,

type Coord = List[Int]

case class Point(val c: Coord) {
  def distTo(p: Point) = {
    val z = (c zip p.c).par
    val pw = z.aggregate(0.0) ( (a,v) => a + math.pow( v._1-v._2, 2 ), _ + _ )
    math.sqrt(pw)
  }
}

so that for any two points, for instance,

val p = Point( (1 to 5).toList )
val q = Point( (2 to 6).toList )

we have that

p distTo q
res: Double = 2.23606797749979

Note method distTo uses aggregate on a parallelised collection of tuples, and combines the partial results by the last argument (summation). For high dimensional points this may prove more efficient than the sequential counterpart.

For simplicity of use, consider also implicit classes, as suggested in a comment above,

implicit class RichPoint(val c: Coord) extends AnyVal {
  def distTo(d: Coord) = Point(c) distTo Point(d)
}

Hence

List(1,2,3,4,5) distTo List(2,3,4,5,6)
res: Double = 2.23606797749979

Collectives™ on Stack Overflow

Refactoring a small Scala function

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related