1

I have some json file with such format:

{"_t":1480647647,"_p":"[email protected]","_n":"app_loaded","device_type":"desktop"}
{"_t":1480647676,"_p":"[email protected]","_n":"app_loaded","device_type":"desktop"}
{"_t":1483161958,"_p":"[email protected]","_n":"app_loaded","device_type":"desktop"}
{"_t":1483162393,"_p":"[email protected]","_n":"app_loaded","device_type":"desktop"}
{"_t":1483499947,"_p":"[email protected]","_n":"app_loaded","device_type":"desktop"}
{"_t":1505361824,"_p":"[email protected]","_n":"added_to_team","account":"1234"}
{"_t":1505362047,"_p":"[email protected]","_n":"added_to_team","account":"1234"}
{"_t":1505362372,"_p":"[email protected]","_n":"added_to_team","account":"1234"}
{"_t":1505362854,"_p":"[email protected]","_n":"added_to_team","account":"1234"}
{"_t":1505366071,"_p":"[email protected]","_n":"added_to_team","account":"1234"}

I'm using Apache Spark in my java application in order to read this json file and save to parquet format.

If I didn't use schema definition then there is no problem with file parsing There is my code example:

Dataset<Row> dataset = spark.read().json(pathToFile);
dataset.show(100);

And there is my console output:

+-------------+------------------+----------+-------+-------+-----------+
|           _n|                _p|        _t|account|channel|device_type|
+-------------+------------------+----------+-------+-------+-----------+
|   app_loaded| [email protected]|1480647647|   null|   null|    desktop|
|   app_loaded| [email protected]|1480647676|   null|   null|    desktop|
|   app_loaded| [email protected]|1483161958|   null|   null|    desktop|
|   app_loaded| [email protected]|1483162393|   null|   null|    desktop|
|   app_loaded| [email protected]|1483499947|   null|   null|    desktop|
|added_to_team|   [email protected]|1505361824|   1234|   null|       null|
|added_to_team|    [email protected]|1505362047|   1234|   null|       null|
...

When I'm using schema definition like this

StructType schema = new StructType();
schema.add("_n", StringType, true);
schema.add("_p", StringType, true);
schema.add("_t", TimestampType, true);
schema.add("account", StringType, true);
schema.add("channel", StringType, true);
schema.add("device_type", StringType, true);
// Read data from file
Dataset<Row> dataset = spark.read().schema(schema).json(pathToFile);
dataset.show(100);

I got console output :

++
||
++
||
||
||
||
...

What's wrong with schma definition?

1 Answer 1

1

StrutType is immutable, so just discard all additions. If you print it

schema.printTreeString

you'll see it doesn't contain any field:

root

You should use:

StructType schema = new StructType()
  .add("_n", StringType, true)
  .add("_p", StringType, true)
  .add("_t", TimestampType, true)
  .add("account", StringType, true)
  .add("channel", StringType, true)
  .add("device_type", StringType, true);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.