3

Suppose I am having a huge list having elements from 1 to 1 million.

val initialList = List(1,2,3,.....1 million)
and 
val myList = List(1,2,3)

Now when I apply an operation such as foldLeft on the myList giving initialList as the starting value such as

val output = myList.foldLeft(initialList)(_ :+ _)

// result ==>> List(1,2,3,.....1 million, 1 , 2 , 3)

Now my question comes here, both the lists being immutable the intermediate lists that were produced were

List(1,2,3,.....1 million, 1)
List(1,2,3,.....1 million, 1 , 2)
List(1,2,3,.....1 million, 1 , 2 , 3)

By the concept of immutability, every time a new list is being created and the old one being discarded. So isn't this operation a performance killer in scala as every time a new list of 1 million elements has to be copied to create a new list.

Please correct me if I am wrong as I am trying to understand the internal implementation of an immutable list. Thanks in advance.

1
  • A good option is typically to have a mutable builder for construction/complex operations, which then yields an immutable structure you can operate on. Commented Mar 9, 2018 at 10:12

2 Answers 2

8

Yup, this is performance killer, but this is a cost of having immutable structures (which are amazing, safe and makes programs much less buggy). That's why you should often avoid appending list if you can. There is many tricks that can avoid this issue (try to use accumulators).

For example:

Instead of:

val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = myList.foldLeft(initialList)(_ :+ _)

You can write:

val initialList = List(1,2,3,.....1 million)
val myList = List(1,2,3,...,100)
val output = List(initialList,myList).flatten

Flatten is implemented to copy first line only once instead of copying it for every single append.

P.S.

At least adding element to the front of list works fast (O(1)), cause sharing of old list is possible. Let's Look at this example: Linked list example You can see how memory sharing works for immutable linked lists. Computer only keeps one copy of (b,c,d) end. But if you want to append bar to the end of baz you cannot modify baz, cause you would destroy foo, bar and raz! That's why you have to copy first list.

Sign up to request clarification or add additional context in comments.

3 Comments

Every time a new list is created ,the intermediate lists becomes available for garbage collection for the jvm right ?
Yes, when you do this: List(1,2,3,.....1 million, 1) List(1,2,3,.....1 million, 1 , 2) List(1,2,3,.....1 million, 1 , 2 , 3) But if you do this from head: List(3,1,2,3,.....1 million) List(2,3,1,2,3,.....1 million) List(1,2,31,2,3,.....1 million) Then not a single list is collected by garbage collector. They will share memory.
Thank you so much. The image that you added clarified my doubt as this was exactly what I was looking for.
1

Appending to a List is not a good idea because List has linear cost for appending. So, if you can

  • either prepend to the List (List have constant time prepend)
  • or choose another collection that is efficient for appending. That would be a Queue

For the list of performance characteristic per operation on most scala collections, See:

https://docs.scala-lang.org/overviews/collections/performance-characteristics.html


Note that, depending on your requirement, you may also make your own smarter collection, such as chain iterable for example

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.