1

I am writing a small CSV parser for a project in scala. The stipulations on the project do not allow the use of regex or non-official external libraries.

I have most of it working, however there is one case that is giving me trouble and that is when the data within the CSV cell contains a comma.

I am parsing a dataset that is stored in a CSV. The parsing function takes an array of strings as its only argument this Array represents taking a row from the CSV and invoking split(","). The data that is stored in the CSV is formatted such that if there is a comma within the data then the entire data cell is enclosed in quotes (""). The issue is that when split(",") is invoked it splits in the middle of the quoted cell because it doesnt know to ignore that comma. I need a way to ignore that comma,essentially "fixing" the paramater array

example

say a line inside the CSV looks as such before any splitting occurs.

"Hello, everyone", green, blue, 5, 3.2

after split(",") we end up with

|"hello|everyone"|green|blue|5|3.2| // (Array length 6)

where I need it to be

|"hello, everyone"|green|blue|5|3.2| //( Array length 5)

Is there an obvious method for doing this that I am missing?

5
  • @jwvh I added a more extensive example that should help clarify Commented Sep 14, 2021 at 1:01
  • @jwvh do you know how CSV files work? green and blue do not have commas within their data, hello, everyone does, so it is surrounded by quotes (""). I need to skip over the comma inside the quotes when using the split(",") method on the line "Hello, everyone", green, blue. Commented Sep 14, 2021 at 1:11
  • for (line <- Source.fromFile(file).getLines().drop(1)) { val column = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1).map(_.trim) } Commented Sep 14, 2021 at 4:43
  • The above code is for the condition when your csv fie contains header. That is why drop(1) used. Source is scala.io.Source and file is java.io Commented Sep 14, 2021 at 7:24
  • @md samual I said that I can’t use regex Commented Sep 14, 2021 at 12:14

1 Answer 1

1
""""Hello, everyone",green,blue,5,3.2"""  //raw CSV
  .split(",")                             //split at all commas  
  .foldRight((List[String](), false)){    //recombine quoted strings
    case (str,(lst, false))  => (str :: lst,      str.count(_ == '"')%2 > 0)
    case (str,(hd::tl,true)) => (s"$str,$hd"::tl, str.count(_ == '"')%2 < 1)
  }._1
  .mkString("|") //reformat just to demonstrate success
//res0: String = "Hello, everyone"|green|blue|5|3.2

Note: This assumes that quote marks, ", come in pairs. If the raw CSV data has a single quote mark, or any odd number of them, then the results are likely incorrect.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.