I have a file with following data
####$ cat products.csv
1,tv,sony,hd,699
2,tv,sony,uhd,799
3,tv,samsung,hd,599
4,tv,samsung,uhd,799
5,phone,iphone,x,999
6,phone,iphone,11,999
7,phone,samsung,10,899
8,phone,samsung,10note,999
9,phone,pixel,4,799
10,phone,pixel,3,699
Im trying to load this into spark dataframe it is giving me no errors but it is loading all nulls.
scala> val productSchema = StructType((Array(StructField("productId",IntegerType,true),StructField("productType",IntegerType,true),StructField("company",IntegerType,true),StructField("model",IntegerType,true),StructField("price",IntegerType,true))))
productSchema: org.apache.spark.sql.types.StructType = StructType(StructField(productId,IntegerType,true), StructField(productType,IntegerType,true), StructField(company,IntegerType,true), StructField(model,IntegerType,true), StructField(price,IntegerType,true))
scala> val df = spark.read.format("csv").option("header", "false").schema(productSchema).load("/path/products_js/products.csv")
df: org.apache.spark.sql.DataFrame = [productId: int, productType: int ... 3 more fields]
scala> df.show
+---------+-----------+-------+-----+-----+
|productId|productType|company|model|price|
+---------+-----------+-------+-----+-----+
| null| null| null| null| null|
| null| null| null| null| null|
| null| null| null| null| null|
| null| null| null| null| null|
| null| null| null| null| null|
| null| null| null| null| null|
| null| null| null| null| null|
| null| null| null| null| null|
| null| null| null| null| null|
| null| null| null| null| null|
+---------+-----------+-------+-----+-----+
Now I tried a different way to load the data and it worked
scala> val temp = spark.read.csv("/path/products_js/products.csv")
temp: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 3 more fields]
scala> temp.show
+---+-----+-------+------+---+
|_c0| _c1| _c2| _c3|_c4|
+---+-----+-------+------+---+
| 1| tv| sony| hd|699|
| 2| tv| sony| uhd|799|
| 3| tv|samsung| hd|599|
| 4| tv|samsung| uhd|799|
| 5|phone| iphone| x|999|
| 6|phone| iphone| 11|999|
| 7|phone|samsung| 10|899|
| 8|phone|samsung|10note|999|
| 9|phone| pixel| 4|799|
| 10|phone| pixel| 3|699|
+---+-----+-------+------+---+
In the second approach it loaded data but I cannot add the scheme to dataframe. what is the difference between about two methods of loading data, why is it loading null for the first approach? can any one help me