3

We have a list of Strings with BEGIN and END marks as parts of this list. Can we filter out elements in-between of BEGIN-END in functional programming style? I came out only with this regular (flag) approach in scala.

val list1 =
  """992
  1010
  1005
  1112
  BEGIN
  1086
  1244
  1107
  1121
  END
  1223
  1312
  1319
  1306
  1469""".lines.toList

var flag = false
val filteredList = list1.filter{
  def f(x: String): Boolean = {
    if (x.contains("BEGIN")) {
      flag = true;
      return false
    } else if (x.contains("END")) {
      flag = false
    }
    flag
  }
  f
}

Is that possible to avoid defining the flag variable? How do they solve this in pure functional languages?

4
  • You can put the var flag = false inside the filter block. It’s only half as bad then. Commented Jan 23, 2011 at 18:21
  • Is it possible to have multiple BEGIN...END sequences in this list? Commented Jan 23, 2011 at 18:25
  • "filter out" was really bad wording if you want the elements between BEGIN and END. Commented Jan 24, 2011 at 2:57
  • Got it. I meant the opposite. Get the elements between BEGIN and END. Commented Jan 24, 2011 at 17:15

6 Answers 6

7

You can use drop/tail, dropWhile, takeWhile functions:

val filteredList = list1.map(_.trim).dropWhile("BEGIN" !=).tail.takeWhile("END" !=)

EDIT

As mentioned in comments tail will throw exception if list is empty, so if you prefer to stay on the safe side use drop(1) instead of tail:

val filteredList = list1.map(_.trim).dropWhile("BEGIN" !=).drop(1).takeWhile("END" !=)

And here is my version of algorithm that handles several BEGIN and END sections (some crazy stuff from me - a little state machine :)

var filteredList1 = list1.map(_.trim).foldLeft(List(None): List[Option[List[String]]]) {
  case (None :: rest, "BEGIN") => Some(Nil) :: rest
  case (Some(list) :: rest, "END") => None :: Some(list) :: rest
  case (Some(current) :: rest, num) => Some(num :: current) :: rest
  case (result, _) => result
}.flatten.reverse map (_.reverse)

it returns List[List[String]]

Sign up to request clarification or add additional context in comments.

5 Comments

Rats! Beat me to it as I was typing :)
Any reason for favouring drop(1) over tail?
nope, you are right - tail looks much better (adding change to the answer). Thanks!
drop(1) does not fail on an empty list; tail does. So which is correct depends on the desired behavior if the input is bad.
@Rex That's the answer I was aiming for :) It all depends on whether or not a BEGIN is guaranteed to be in the input list.
3

To start with, each string in your list contains the spaces from the start of the line.

This is the biggest problem in your code, and there's two ways to fix it.

Either trim the lines...

val list1 =
  """992
  1010
  ...
  1306
  1469""".lines.map(_.trim).toList

... or you can precede each line with a | and use stripMargin.

Then it's just a small matter of applying takeWhile/dropWhile

list1.takeWhile("BEGIN" !=) ++ list1.dropWhile("END"!=).tail

or more efficiently:

val (begin,middle) = list1.span("BEGIN" !=)
val end = middle.dropWhile("END" !=).tail
begin ++ end

EDIT

I had the solution back to front, that would drop (filter out) values between BEGIN and END. To retain them:

list1.dropWhile("BEGIN" !=).tail.takeWhile("END"!=)

EDIT 2

Rising to the challenge here... I'll allow for multiple BEGIN/END blocks, but also consider that the input might be badly malformed. What if there was a BEGIN without a corresponding END? Perhaps there are two BEGINs in a row, or the list runs out before there's an END.

Defining some rules:

  • An END without a corresponding BEGIN is ignored
  • BEGIN/END blocks don't nest
  • A BEGIN encountered while already in a block starts a new block
  • If the list runs out while in a block, then an implicit END is assumed

Without further ado, first create an iterator that identifies every "BEGIN" in the input:

val blocksStarts =
  Iterator.iterate(list1)(_.dropWhile("BEGIN" !=).drop(1)).drop(1).takeWhile(Nil !=)

//This iterator tries to continue forever,
//returning Nils once the sequences are exhausted
//For this reason, we must use drop(1) instead of tail

Giving an iterator of lists, each starting at a "BEGIN"

To then take elements from each of these lists until the corresponding "END" is reached, or another "BEGIN", or the list is exhausted:

val blocks = blockStarts map {
  _.takeWhile(x => x != "BEGIN" && x != "END")
} toList

The final toList is because it's still an Iterator at that point. You now have a List of Lists, each corresponding to a batch of elements in a "Block", as defined by the previous rules.

3 Comments

I know the word "filter out" is unclear, but I think you're doing the exact opposite of dmitry's code, and the exact opposite of Easy Angel's code -- you're removing the stuff between BEGIN and END and they're retaining only the stuff between BEGIN and END
Good answer for my challenge. I felt both my solutions were very clumsy. I'll have to learn more ways to use Iterator.iterate. I did notice a bug, however, which is that the first element returned by Iterator.iterate will be list1, so you'll need to add a drop(1) before the takeWhile.
Indeed, now fixed. No more late night postings on SO for me then! (let's see just how long that resolution lasts...)
2

I'm extending others' answers a little to present a case where there are two BEGIN...END blocks in the list.

val list1 =
  """992
  1010
  1005
  1112
  BEGIN
  1086
  1244
  1107
  1121
  END
  1223
  1312
  BEGIN
  773
  990
  224
  END
  1319
  1306
  1469""".lines.map(_.trim).toList

We're going to use foldRight to pass a status accumulator between iterations. Note that we're using foldRight to make constructing the result list efficient, so we're going to encounter END before we encounter BEGIN.

case class StripStatus(list:List[String], retaincurrent:Boolean)

list1.foldRight(StripStatus(Nil,false)){ (curElem:String, curStatus:StripStatus) =>
   if (curElem == "END")
      StripStatus(curStatus.list,true)
   else if (curElem == "BEGIN")
      StripStatus(curStatus.list,false)
   else if (curStatus.retaincurrent)
      StripStatus(curElem::curStatus.list, true)
   else
      curStatus
}.list

We could just as easily use foldLeft and reverse the result list at the end:

list1.foldLeft(StripStatus(Nil,false)){ (curStatus:StripStatus, curElem:String) =>
   if (curElem == "BEGIN")
      StripStatus(curStatus.list,true)
   else if (curElem == "END")
      StripStatus(curStatus.list,false)
   else if (curStatus.retaincurrent)
      StripStatus(curElem::curStatus.list, true)
   else
      curStatus
}.list.reverse

1 Comment

I've got the idea - to pass the result list and flag iterating through the collection.
1

MMmm. Here's my take:

def getInside(l: List[String]) = {
    def concat(in: List[String], out: List[String]): List[String] = in ::: off(out)

    def off(l: List[String]): List[String] = 
        if (l.isEmpty) Nil 
        else on(l dropWhile ("BEGIN" !=) drop 1)

    def on(l: List[String]): List[String] = 
        if (l.isEmpty) Nil
        else (concat _).tupled(l span ("END" !=))

    off(l)
}

2 Comments

Note that this is not tail recursive.
@Ken True, granted, but how many pairs of BEGIN END are there in the list, for this to be a problem? I do like making things tail recursive, but it was rather difficult here. The mutual recursion is fine -- I even started writing a version with the trampoline code of the library, but the list concatenation...
0

I don't know Scala, but you can define a function that returns the index in the list of the next element that matches a substring, and returns the index where the substring was found and also the list of elements encountered until that substring was matched. A pseudocode header: findSubstr(list, startIndex). Then build the expression (more pseudocode):

beginIndex, preBeginElems = findSubstr(list, 0)
endIndex, inBetweenElems = findSubstr(list, beginIndex)
restElems = list[endIndex until the end]

If helpful, I could write this in Haskell... :)

EDIT: There are probably other ways to do it, too

Comments

0

Again, with the same goal of dealing with multiple BEGIN...END spans in the list.

def getBetweenBeginEnd(l:List[String]) = {
   def internal(l:List[String],accum:List[String]):List[String]={
      val (keep, keepChecking) = l.dropWhile("BEGIN" !=).drop(1).span("END" !=)
      if (keepChecking == Nil)
         accum:::keep
      else
         internal(keepChecking.tail,accum:::keep)
   }
   internal(l,Nil)
}

1 Comment

the drop(1) can't be replaced with tail here, because Nil.drop(1)==Nil, but Nil.tail throws an exception.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.