2

I am posting this question after searching a lot on the web but couldn't find the answer. I have a JSONArray in below format

[ 
  {
    "firstName":"John",
    "lastName":"Doe",
    "deparment" : {
       "DeptCode":"10",
       "deptName" : "HR"
     }
  },
  {
    "firstName":"Mel",
    "lastName":"Gibson",
    "deparment" : {
       "DeptCode":"20",
       "deptName" : "IT"
 }
}
]

The JSONArray is from org.json.simple.JSONArray package. I am trying to convert this into Java Spark Dataframe. I was trying with the below code :

SparkConf conf = new SparkConf().setAppName("linecount").setMaster("local[*]");
SparkSession session = SparkSession.builder().config(conf).getOrCreate();       
Dataset<Row> dataset = session.read().json(array.toString());

But no luck. I am facing below error. Also I can see in scala we can convert it to Dataframe using DS method. has someone tried this before ?

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: [{"firstName":%22John%22,%22lastName%22:%22Doe%22%7D,%7B%22firstName%22:%22Mel%22,%22lastName%22:%22Gibson%22%7D%5D
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:615)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:349)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:333)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:279)
at com.vikas.rawat.AnotherMainClass.main(AnotherMainClass.java:34)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: [{"firstName":%22John%22,%22lastName%22:%22Doe%22%7D,%7B%22firstName%22:%22Mel%22,%22lastName%22:%22Gibson%22%7D%5D
at java.net.URI.checkPath(Unknown Source)
at java.net.URI.<init>(Unknown Source)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 14 more
5
  • What to you mean by "no luck"? Compilation errors? Runtime errors? Error messages? Stack traces? Please be specific. (For what it is worth, programming is not about luck ...) Commented Jan 13, 2022 at 9:00
  • @StephenC seems that it is not a right way to do. Also it's throwing below error : Caused by: java.net.URISyntaxException: Relative path in absolute URI: [{"firstName":%22John%22,%22lastName%22:%22Doe%22%7D Commented Jan 13, 2022 at 9:12
  • Please EDIT your question to include the complete stacktrace as text. Commented Jan 13, 2022 at 9:34
  • I have included the stacktrace as well. Commented Jan 13, 2022 at 9:39
  • I don't know the answer ... but I know why json(array.toString()) doesn't work. The json method expects its argument to be a string that is a path to a file in the file system. Commented Jan 13, 2022 at 10:26

2 Answers 2

1

You should create a RDD from JSON string and pass that to spark.read.json method.

SparkSession spark = SparkSession.builder().master("local").getOrCreate();

String s = "{\"root\":[ \n" +
                "  {\n" +
                "    \"firstName\":\"John\",\n" +
                "    \"lastName\":\"Doe\",\n" +
                "    \"deparment\" : {\n" +
                "       \"DeptCode\":\"10\",\n" +
                "       \"deptName\" : \"HR\"\n" +
                "     }\n" +
                "  },\n" +
                "  {\n" +
                "    \"firstName\":\"Mel\",\n" +
                "    \"lastName\":\"Gibson\",\n" +
                "    \"deparment\" : {\n" +
                "       \"DeptCode\":\"20\",\n" +
                "       \"deptName\" : \"IT\"\n" +
                " }\n" +
                "}\n" +
                "]}";
JSONObject json = (JSONObject) JSONValue.parse(s);
JSONArray msgsArray = (JSONArray) json.get("root");

scala.collection.Seq<String> seq = scala.collection.JavaConverters.asScalaIteratorConverter
                       (Arrays.asList(msgsArray.toString()).iterator()).asScala().toSeq();

RDD<String> jsonRDD = spark.sparkContext().
parallelize(seq, 4, scala.reflect.ClassTag$.MODULE$.apply(String.class));

spark.read().json(jsonRDD).show();


+---------+---------+--------+
|deparment|firstName|lastName|
+---------+---------+--------+
| {10, HR}|     John|     Doe|
| {20, IT}|      Mel|  Gibson|
+---------+---------+--------+
Sign up to request clarification or add additional context in comments.

2 Comments

what If I flatten the department as well. like the deptcode and deptname should be the col name and it should omit department from it.
If it's flattended in JSONArray itself, then you will get 2 columns as you mentioned. Otherwise you need to use explode function.
0

You can import json from a string to a dataset, but there is a caveat that it has to be one object per string.

Spark Documentation:

// Alternatively, a DataFrame can be created for a JSON dataset represented by // a Dataset[String] storing one JSON object per string val otherPeopleDataset = spark.createDataset( """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil) val otherPeople = spark.read.json(otherPeopleDataset) otherPeople.show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.