I have a schema that looks like this:
root
|-- FirstName: string (nullable = true)
|-- Age: integer (nullable = true)
I want to change this schema and write this data to a file so it prints out like this:
["Alice", 22],
["Bob", 21],
["Charlie", 23]
As you can see, each line is still valid JSON.
It seems like a data frame always has to have a list of columns. If I do something like:
df.write.json("/path")
Then I always get JSON objects like this:
{"FirstName":"Alice","Age":22}
{"FirstName":"Bob","Age":21}
I think the way to do this is to convert the Spark data frame to a Spark type and then manually construct each string how I want it, but that doesn't seem functional.
Here's what I tried:
val df2 = df.withColumn("NewColumn", array(col("FirstName"), col("Age")))
.select("First NewColumn")
df2.write.json("./output.json")
Unfortunately, this gave me output like this:
{"NewColumn":["Alice",22]}
{"NewColumn":["Bob",21]}
I then tried outputting as text like this:
val df2 = df.withColumn("NewColumn", concat(
lit("[\""),
col("FirstName"),
lit("\",\""),
col("Age"),
lit("\"]")))
.select(col("NewColumn"))
df2.write.text("./myFile.txt")
This time, it looks like this:
["Alice","22"]
["Bob","21"]
This is better, but surely I don't have to concatenate literal characters together to get output in this format?