3

assume I have a list of tuples...

val l: List[(String, String, Date)] 

...this list gets sorted by date...

val sorted = l.sortWith((a, b) => a._3 < b._3 )

And now I want to split this sorted list into multiple list. The split should happen between tuples where the date difference is greater then 3 days. What would be a good and performant way to do that?

Thanks and regards!

EDIT:

Here is an example: Input (already sorted):

List(("a1", "b1", "2016-01-30"), ("a2", "b2", "2016-02-01"), ("a3", "b3", "2016-02-20"), ("a4", "b4", "2016-02-23"), ("a5", "b5", "2016-02-25"))

Expected output:

List(List(("a1", "b1", "2016-01-30"), ("a2", "b2", "2016-02-01")), List(("a3", "b3", "2016-02-20"), ("a4", "b4", "2016-02-23"), ("a5", "b5", "2016-02-25")))

4
  • 1
    Could you give expected output for different cases? Commented Jul 14, 2016 at 6:56
  • 2
    Use sortBy(_._3) instead of sortWith Commented Jul 14, 2016 at 7:41
  • val sorted = List(l.sortBy(_._3 )) Commented Jul 14, 2016 at 7:55
  • 1
    Possible duplicate of split list in scala based on diff between neighbouring elements Commented Jul 14, 2016 at 7:58

4 Answers 4

3

Gee, if this is a party I might as well throw my 2-cents around.

sorted.init.foldRight(List(List(sorted.last))){ (tup,acc) => 
  if (acc.head.head._3 - tup._3 > /*test for time-gap here*/) 
    List(tup)::acc  // gap too big, start new sub-List
  else
    (tup::acc.head)::acc.tail  // prepend to current sub-List
}
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for this very compact and elegant solution!
0

I have simplified the types a bit, the code just illustrates the concept. The performance is probably better without the toList conversion.

type MyTuple = (String, Int)

val sorted: List[MyTuple] = List(
    ("a", 1),
    ("a", 2),
    ("a", 3),
    ("b", 7),
    ("b", 9),
    ("c", 13),
    ("c", 15),
    ("c", 16)
)

def test(a: MyTuple, b: MyTuple): Boolean = b._2 - a._2 > 3

// We prepend the head so that the first sliding pair will have distance 0
val lists = (sorted.head :: sorted)
  .sliding(2)
  .map { case List(a, b) => (b, test(a, b)) }
  .toList

def split(list: List[(MyTuple, Boolean)]): List[List[MyTuple]] = list match {
  case Nil => Nil
  case head :: tail => {
    val (l1, l2) = tail.span(!_._2)
    (head :: l1).map(_._1) :: split(l2)
  }
}

val splitLists = split(lists).map(_.map(_._1))
println(splitLists.mkString("\n"))

Output:

List(a, a, a)
List(b, b)
List(c, c, c)

Comments

0

For convenience, I've substitued Ints for Dates, but the prinicpal is the same.

   val data = List(
      ("a","a", 10),
      ("a","b", 30),
      ("a","b", 11),
      ("a","b", 33),
      ("s","c", 37),
      ("a","c", 26),
      ("a","d", 22),
      ("m","a", 18),
      ("t","a", 15)
    )
    val sortedData = data.sortWith ((a,b)=> a._3 < b._3)
    println(s"$sortedData")
    val splitPass = sortedData.foldLeft(List[(String, String,Int)](), List[List[(String, String,Int)]](), sortedData.head._3){
      case ((cl, acc, ld),nt) =>
        if (nt._3-ld>3)
          (List(nt), cl.reverse ::acc, nt._3)
        else
          (nt:: cl, acc, nt._3)
    }
    val (fl, fa, _) = splitPass
    val res = (if (fl.isEmpty) fa else fl :: fa).reverse
    println(s"$res")

This gives the sorted list:

List((a,a,10), (a,b,11), (t,a,15), (m,a,18), (a,d,22), (a,c,26), (a,b,30), (a,b,33), (s,c,37))

and the result list:

List(List((a,a,10), (a,b,11)), List((t,a,15), (m,a,18)), List((a,d,22)), List((a,c,26)), List((a,b,30), (a,b,33)), List((s,c,37)))

What this does is a single pass through the sorted list, building an accumulator consisting of (List of items in the current group, List of completed groups, Int[Date] of last added item). This is seeded with two empty lists and the Date of the first item in the list.

For each item in the source list, if it's close to the previous one, it gets added to the Current group, if it's far from the previous item, the current group is closed and added to the completed gorups list, and the current item becoes the first item in a new list, and the current item's date becomes the reference date for the next check. If you wanted to break where the date was different to far from the earliest in the current group, it is easy to change the else branch to pass ld instead of nt._3.

At the end, you need to add any uncompleted group to the final collection of groups.

The two '.reverse's are necessary because the lists are, in the typcial functional style, built backwards (becuase it's cheaper that way) and reversed at completion.

2 Comments

Using foldRight instead of foldLeft would obviate the need for reversing anything.
@Jubobs, not really, it's just moving where the reverse happens. override def foldRight[B](z: B)(op: (A, B) => B): B = reverse.foldLeft(z)((right, left) => op(left, right)) (from github.com/scala/scala/blob/v2.11.8/src/library/scala/…)
0

Here is a very clean linear-time (single-pass) approach:

type Date = Int  // For simplicity in the example

val sorted: List[(String,String,Date)] = List(("a1", "b1", 1),
                                              ("a1", "b1", 2),
                                              ("a1", "b1", 6),
                                              ("a1", "b1", 8),
                                              ("a1", "b1", 10),
                                              ("a1", "b1", 15),
                                              ("a1", "b1", 16))

val result = sorted.sliding(2).foldLeft(Vector(Vector(sorted.head))) { 
  case (z, List(t1, t2)) => 
    if (t2._3 - t1._3 > 3) z :+ Vector(t2)
    else                   z.init :+ (z.last :+ t2)
}

result.foreach(println)
// Vector((a1,b1,1), (a1,b1,2))
// Vector((a1,b1,6), (a1,b1,8), (a1,b1,10))
// Vector((a1,b1,15), (a1,b1,16))

You can handle the special cases where sorted.length() < 2 separately.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.