4

I have a DataFrame with a column of string type, this string is a JSON format, I wanted to convert this column to multiple columns based on this JSON format. I can do it if I have the JSON schema, but I don't have it.

Example:

Original Dataframe:

---------------------
|        json_string|
---------------------
|{"a":2,"b":"hello"}|
|   {"a":1,"b":"hi"}|
---------------------

After Conversion/Parse

--------------
|  a |     b |
--------------
|  2 |  hello|
|  1 |     hi|
--------------

I using Apache Spark 2.1.1.

0

1 Answer 1

17

If you do not have a predefined schema the other option is to convert it to RDD[String] or Dataset[String] and load as a json

Here is how you can do

//convert to RDD[String]
val rdd = originalDF.rdd.map(_.getString(0))

val ds = rdd.toDS

Now load as a json

val df = spark.read.json(rdd) // or spark.read.json(ds)

df.show(false)

Also use json(ds), json(rdd) is deprecated from 2.2.0

@deprecated("Use json(Dataset[String]) instead.", "2.2.0")

Output:

+---+-----+
|a  |b    |
+---+-----+
|2  |hello|
|1  |hi   |
+---+-----+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.