4

How can i parse this csv file in Scala, to extract an object Data contain (date,time,longitude,latitude)

*M…….:Dy4.5

*N……….:14_540

*V…..:N

*S….:1.2.1

*yyyy/mm/dd;hh:mm:ss;long;lat

2016/05/09;12:50:19;-122.45006;38.47320

2016/05/09;13:04:10;-122.45011;38.47317

i already wrote this function but it just read the file, i don't know how to transform it into object

def readData(fileName:String): Vector[Array[String]] = {
      for {
        line <- Source.fromFile(fileName).getLines().toVector
        values = line.split(";").map(_.trim)
      } yield values
    }
1
  • I think you need to define 'values' as a variable or value. You can use a regular expression to find and save particular segments of text. Commented Mar 30, 2017 at 16:37

2 Answers 2

2

You can use scala type matching for this to build up on Anastasiia Kharchenko's response

def readData(fileName:String): Vector[Data] = {
  for {
    line <- Source.fromFile(fileName).getLines().toVector
    data <- parseCsvLine(line)
  } yield data
}

def parseCsvLine(line: String): Option[Data] = {
    line.split(";").toVector.map(_.trim) match {
             case Vector(date, time, longitude, latitude) => Some(Data(date, time, longitude, latitude))
             case _ => println(s"WARNING UNKNOWN DATA FORMAT FOR LINE: $line")
                       None
         }

    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

your solution work perfectly, thanks Can you please help me to convert date from String to a type Date. i tried that, but I got always a NullPointerException: val format = new java.text.SimpleDateFormat("yyyy/MM/dd") Some(Data(format.parse(date),....
So first I would make sure everything was parsed correctly, then I would have something in the case class for the data (if we continue with the example above) case class Data(date: String, time: String, longitude: String, latitude: String) { def getDate(): java.util.Date = { val format = new java.text.SimpleDateFormat("yyyy/MM/dd"); format.parse(date) } } I would not change the date in the parseCsv function as the function should only focus on parsing a csv line
i have used java.time.format.DateTimeFormatter and it resolve my issue, because it's not practical to redefine getter for each attribute, Thank a lot for your help
2

Assuming you have class Data

case class Data(date: String, time: String, longitude: String, latitude: String)

(date and time are strings just for simplifying example).

The code below will give you a vector of Data objects

def readData(fileName:String): Vector[Data] = {
  for {
    line <- Source.fromFile(fileName).getLines().toVector
    values = line.split(",").map(_.trim)
    date = Date(values(0), values(1), values(2), values(3))
  } yield date
}

1 Comment

When I try your solution I go the error Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.