I am new to Spark and Scala. I am trying to create a Dataframe from a JSONArray. Below is my code:
public class JSONParse{
public JSONArray actionItems() {
JSONParser parser = new JSONParser();
JSONArray results = null;
try {
JSONObject obj = (JSONObject) parser.parse(new FileReader("/data/home/actionitems.json"));
JSONObject obj2 = (JSONObject) obj.get("d");
results = (JSONArray) obj2.get("results");
System.out.println(results);
} catch (Exception e) {
e.printStackTrace();
}
return results;
}
}
object driver {
val parse = new JsonParse
val conf = new SparkConf().setAppName("test")
val sc = new SparkContext(conf)
sc.setLogLevel("ERROR")
val hiveContext = new HiveContext(sc)
val sqlContext = new SQLContext(sc)
def main(args: Array[String]): Unit = {
val actionItemsRDD = sc.parallelize(Seq(parse.actionItems.toString))
val df: DataFrame = hiveContext.read.json(actionItemsRDD)
df.show
println("number of records: "+df.count)
}
}
The Java class JsonParse reads the json from a file and returns the JSONArray to the scala Object driver. In driver , I convert the Json String to an RDD and then create the Dataframe using hiveContext.read.json(actionItemsRDD). I build using maven and there are no build errors.
However, when I run the jar, I get this error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.json(Lorg/apache/spark/rdd/RDD;)Lorg/apache/spark/sql/Dataset;
It throws the exception at the hiveContext.read.json line. I've done this before and had no issues. I am also using the same dependencies from my previous attempt. Below is my pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>json</groupId>
<artifactId>test</artifactId>
<version>1.0-SNAPSHOT</version>
<name>${project.artifactId}</name>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>scala-compile-first</id>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile</id>
<phase>process-test-resources</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>org.apache.http</pattern>
<shadedPattern>org.shaded.apache.http</shadedPattern>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>shaded</shadedClassifierName>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.11</artifactId>
<version>1.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.6</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.jodd/jodd -->
<dependency>
<groupId>org.jodd</groupId>
<artifactId>jodd</artifactId>
<version>3.4.0</version>
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/org.json/json -->
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20170516</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpcore -->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.4.4</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.googlecode.json-simple/json-simple -->
<dependency>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.threeten/threetenbp -->
<dependency>
<groupId>org.threeten</groupId>
<artifactId>threetenbp</artifactId>
<version>1.3.3</version>
</dependency>
</dependencies>
</project>
Not sure why this error is showing up and I'm not able to resolve it. Any help would be appreciated. Thank you!