0

Given a csv in the format below, what is the best way to load it into Scala as type Map[String, Array[String]], with the first key being the unique values for Col2, and the value Array[String]] as all co-occurring values of Col1?

a,1,
b,2,m
c,2,
d,1,
e,3,m
f,4,
g,2,
h,3,
I,1,
j,2,n
k,2,n
l,1,
m,5,
n,2,

I have tried to use the function below, but am getting errors trying to add to the Option type: += is not a member of Option[Array[String]]

In addition, I get overloaded method value ++ with alternatives: with regards to the line case None => mapping ++ (linesplit(2) -> Array(linesplit(1)))

def parseCSV() : Map[String, Array[String]] = {
    var mapping = Map[String, Array[String]]()
    val lines = Source.fromFile("test.csv")
    for (line <- lines.getLines) {
      val linesplit = line.split(",")
      mapping.get(linesplit(2)) match {
        case Some(_) => mapping.get(linesplit(2)) += linesplit(1)
        case None => mapping ++ (linesplit(2) -> Array(linesplit(1)))
      }
    }
    mapping
  }
}

I am hoping for a Map[String, Array[String]] like the following:

(2 -> Array["b","c","g","j", "k", "n"])
(3 -> Array["e","h"])
(4 -> Array["f"])
(5 -> Array["m"])
1

3 Answers 3

1

You can do the following: First - read the file to List[List[String]]:

val rows: List[List[String]] = using(io.Source.fromFile("test.csv")) { source =>
   source.getLines.toList map { line =>
   line.split(",").map(_.trim).toList
  }
}

Then, because the input has only 2 values per row, I filter the rows (rows with only one value I want to ignore)

val filteredRows = rows.filter(row => row.size > 1)

And the last step is to groupBy the first value (which is the second column - the index column is not returned from Source.fromFile):

filteredRows.groupBy(row => row.head).mapValues(_.map(_.last)))
Sign up to request clarification or add additional context in comments.

Comments

1

This isn't complete, but it should give you an outline of how it might be done.

io.Source
  .fromFile("so.txt")    //open file
  .getLines()            //line by line
  .map(_.split(","))     //split on commas
  .toArray               //load into memory
  .groupMap(_(1))(_(0))  //Scala 2.13

//res0: Map[String,Array[String]] = Map(4 -> Array(f), 5 -> Array(m), 1 -> Array(a, d, I, l), 2 -> Array(b, c, g, j, k, n), 3 -> Array(e, h))

You'll notice that the file resource isn't closed, and it doesn't handle malformed input. I leave that for the diligent reader.

Comments

1

For the above code mutable Map & ArrayBuffer should be used, as they could be mutated/updated later.

def parseCSV(): Map[String, Array[String]] = {
val mapping = scala.collection.mutable.Map[String, ArrayBuffer[String]]()
val lines = Source.fromFile("test.csv")
for (line <- lines.getLines) {
  val linesplit = line.split(",")
  val key = line.split(",")(1)
  val values = line.replace(s",$key", "").split(",")
  mapping.get(key) match {
    case Some(_) => mapping(linesplit(1)) ++= values
    case None =>
      val ab = ArrayBuffer[String]()
      mapping(linesplit(1)) = ab ++= values
  }
}
 mapping.map(v => (v._1, v._2.toArray)).toMap
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.