1

This is my pseudo code, obviously this is not idiomatic scala as I use a lot of mutable variables and loops.

My goal is to transform this into idiomatic scala in a functional way.

The objective of the algorithm is that given a list of strings, do a N^2 comparison on itself to find matches using edit distance. Because edit distance checks are example, I want to avoid doing edit distance checks on string pairs that have been checked.

def pairwisecomp(block: Iterable[String]): List[(String, String)] = {
    var matches = new ListBuffer[(String, String)]

    // pseudo code; not intended to be valid scala
    for (var i = 0; i < block.size; i++) {
       val str1 = block[i]
       // iterate from i because no need to go backwards to compare
       for (var j = i+1; j < block.size; j++) {
           val str2 = block[j]
           // check edit distance
           // if less than 2, append into the matches
           if (editDistance(str1, str2) < 2) {
              matches.append((str1, str2))
           }
       }
    }


    return matches.toList
  }

4 Answers 4

3

I assume it can be shortened though, I think the code blow is doing exactly the same as you wrote.

block.flatMap{ str1 =>
  block.zipWithIndex.filter{ case (elem,index) => index > i}.map{ case (str2,index) =>
    ((str1,str2),editDistance(str1,str2) < 2)
  }.filter{_.2}.map{_1}
}.toList

EDIT: It maybe simpler in for-comprehensions.

val b = block.zipWithIndex
(for {
  (str1,outerIndex) <- b
  (str2,innerIndex) <- b if innerIndex > outerIndex && editDistance(str1,str2) < 2
} yield (str1,str2)).toList
Sign up to request clarification or add additional context in comments.

3 Comments

block.zipWithIndex.filter{ case (elem,index) => index > i} so for nested loops this is the right away to start iterating at a certain index?
Don't you need to do block.view.zipWithIndex instead?
actually, scala for is just an syntax sugar of foreach,flatMap,map and filter method. in another way to say, the code can be written as for-comprehensions.and I guess block.view.zipWithIndex is slower because the loop looks all the values in it.
1

Without a working editDistance() to test with it's hard to know, but I think this is close to what you're after.

def pairwisecomp(block: Seq[String]): List[List[String]] = {
  block.combinations(2).filter(editDistance(str1, str2) < 2).toList
}

Note that I changed the block collection type so that I can use the combinations() method. There are other ways to achieve this.

Comments

1

'tails' is your friend:

   def tryIt(): Unit = {
    def editDistance(v1: (String, Int), v2: (String, Int)): Int = {
      val (_, d1) = v1
      val (_, d2) = v2
      Math.abs(d1-d2)
    }
    val src = Seq (("C", 3), ("D", 4), ("A", 1), ("B", 2), ("E", 5), ("F", 6))
    val res = for {
      (h::tl) <- src.tails
      t <- tl if editDistance(h ,t) < 2
    } yield (h,t)
    res.foreach {case (r1, r2) => println(s"$r1, $r2")}
  }

(h::tl) is a series of each element (h) in the input plus everything following it (tl). From that you compare each element 't' in the tail sequence tl. If 'h' and 't' are close enough, yield that pair.

Comments

1

I think you can use a for-comprehension for this:

def pairwisecompII(block: Vector[String]): Seq[(String, String)] = {
  for {
    i ← block.indices
    str1 = block(i)
    j ← i + 1 until block.size
    str2 = block(j)
    if editDistance(str1, str2) < 2
  } yield (str1, str2)
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.