Parsing CSV file with JSON array with Scala

Ask Question

Asked 7 years, 1 month ago

Modified 7 years, 1 month ago

Viewed 768 times

I have a CSV file that is really nasty to parse. It has columns with double quotes and commas for column row followed by JSON in another column. Example:

+----------+-------------------+--------------------------------------+-------------------------+
 |  column1|  column2          |           jsonColumn1                | jsonColumn2
+----------+-------------------+--------------------------------------+-----------------------
|  201     |  "1", "ABC", "92" | [{ "Key1": 200,"Value1": 21 },       |[{"date":"9999-09-26T08:50:06Z","fakenumber":"1-877-488-2364-","fakedata":"4.20","fakedata2":"102332.06"}]
                                 {"Key2": 200, "Value2" : 4}]  
+------+--------------------------------------------------------------+---------------------------------

I need to extract it using Scala, how do I make it ignore the commas in column 2 and append a select key value pair as a new column for each row? I want it to look like this

+----------+-------------------+--------------------------------------+-------------------------+-------------------------+--------------------------------
 |  column1|  column2          |           jsonColumn1                | jsonColumn2          |  jsonColumn1Key             | jsonColumnDate
+----------+-------------------+--------------------------------------+-----------------------+----------------+--------------------------------------+
|  201     |  "1", "ABC", "92" |       Keep Orginal Record            |keep original record  |      200                    | 9999-09-26T08:50:06Z

+------+--------------------------------------------------------------+---------------------------------

What I've done so far is import the data, create the schema (before parsing) and then use the structfield to add new schema to the innerjson for the columns that have JSON.

import org.apache.spark.sql.types._

csvSchema = StructType(
        .add("column1", StringType, true)
        .add("column2", StringType, true)
        .add("jsonColumn1", StringType, true)
        .add("jsonColumn2", StringType, true)

The first issue I run into is column 2. How do I work around this? For the parsing of JSON in CSV I was going to emulate a similar solution here: split JSON value from CSV file and create new column based on json key in Spark/Scala

EDIT

 csvfile = sqlContext.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.option("quote", "\"")
.option("escape", "\"")
.load("file.csv")
 display(csvfile)

edited Oct 12, 2018 at 16:37

RAGHHURAAMM

1,0997 silver badges15 bronze badges

asked Oct 11, 2018 at 21:00

ssjtam

112 bronze badges

What does a typical line from the csv file look like?

Shawn
– Shawn

2018-10-11 21:18:22 +00:00
Commented Oct 11, 2018 at 21:18
1

And are you using a csv library to read it, or trying to write your own parser?

Shawn
– Shawn

2018-10-11 21:19:59 +00:00
Commented Oct 11, 2018 at 21:19
The first box I showed is what the typical line looks like. This is a simplified version of it.

ssjtam
– ssjtam

2018-10-11 21:37:54 +00:00
Commented Oct 11, 2018 at 21:37
And I am using the CSV library reader. I'm open to Scala or Pyspark

ssjtam
– ssjtam

2018-10-11 21:38:29 +00:00
Commented Oct 11, 2018 at 21:38
1

Please post your solution as an answer to this question for others to see. @ssjtam

Jordan Parmer
– Jordan Parmer

2018-10-12 18:15:27 +00:00
Commented Oct 12, 2018 at 18:15

| Show 5 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Parsing CSV file with JSON array with Scala

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked