i am trying to upload a csv file into a tempTable such that I can query on it and I am having two issues. First: I tried uploading the csv to a DataFrame, and this csv has some empty fields.... and I didn't find a way to do it. I found someone posting in another post to use :
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("cars.csv")
but it gives me an error saying "Failed to load class for data source: com.databricks.spark.csv"
Then I uploaded the file and read it as a text file, without the headings as:
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
import sqlContext.implicits._;
case class cars(id: Int, name: String, licence: String);
val carsDF = sc.textFile("../myTests/cars.csv").map(_.split(",")).map(p => cars( p(0).trim.toInt, p(1).trim, p(2).trim) ).toDF();
carsDF.registerTempTable("cars");
val dgp = sqlContext.sql("SELECT * FROM cars");
dgp.show()
gives an error because one of the licence field is empty... I tried to control this issue when I build the data frame but did not work. I can obviously go into the csv file but and fix by adding a null to it but U do not want to do this because of there are a lot of fields it could be problematic. I want to fix it programmatically either when i create the dataframe or the class...
any other thoughts please let me know as well