3

I have a file with following data

####$ cat products.csv 
1,tv,sony,hd,699
2,tv,sony,uhd,799
3,tv,samsung,hd,599
4,tv,samsung,uhd,799
5,phone,iphone,x,999
6,phone,iphone,11,999
7,phone,samsung,10,899
8,phone,samsung,10note,999
9,phone,pixel,4,799
10,phone,pixel,3,699

Im trying to load this into spark dataframe it is giving me no errors but it is loading all nulls.

scala> val productSchema = StructType((Array(StructField("productId",IntegerType,true),StructField("productType",IntegerType,true),StructField("company",IntegerType,true),StructField("model",IntegerType,true),StructField("price",IntegerType,true))))
productSchema: org.apache.spark.sql.types.StructType = StructType(StructField(productId,IntegerType,true), StructField(productType,IntegerType,true), StructField(company,IntegerType,true), StructField(model,IntegerType,true), StructField(price,IntegerType,true))

scala> val df = spark.read.format("csv").option("header", "false").schema(productSchema).load("/path/products_js/products.csv")
df: org.apache.spark.sql.DataFrame = [productId: int, productType: int ... 3 more fields]

scala> df.show
+---------+-----------+-------+-----+-----+
|productId|productType|company|model|price|
+---------+-----------+-------+-----+-----+
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
|     null|       null|   null| null| null|
+---------+-----------+-------+-----+-----+

Now I tried a different way to load the data and it worked

scala> val temp = spark.read.csv("/path/products_js/products.csv")
temp: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 3 more fields]

scala> temp.show
+---+-----+-------+------+---+
|_c0|  _c1|    _c2|   _c3|_c4|
+---+-----+-------+------+---+
|  1|   tv|   sony|    hd|699|
|  2|   tv|   sony|   uhd|799|
|  3|   tv|samsung|    hd|599|
|  4|   tv|samsung|   uhd|799|
|  5|phone| iphone|     x|999|
|  6|phone| iphone|    11|999|
|  7|phone|samsung|    10|899|
|  8|phone|samsung|10note|999|
|  9|phone|  pixel|     4|799|
| 10|phone|  pixel|     3|699|
+---+-----+-------+------+---+

In the second approach it loaded data but I cannot add the scheme to dataframe. what is the difference between about two methods of loading data, why is it loading null for the first approach? can any one help me

1 Answer 1

4

You define the string type of columns as integertype that is wrong first. And this is working,

import org.apache.spark.sql.types.{StructType, IntegerType, StringType}

val productSchema = new StructType()
                        .add("productId", "int")
                        .add("productType", "string")
                        .add("company", "string")
                        .add("model", "string")
                        .add("price", "int")

val df = spark.read.format("csv")
            .option("header", "false")
            .schema(productSchema)
            .load("test.csv")

df.show()

the result is

+---------+-----------+-------+------+-----+
|productId|productType|company| model|price|
+---------+-----------+-------+------+-----+
|        1|         tv|   sony|    hd|  699|
|        2|         tv|   sony|   uhd|  799|
|        3|         tv|samsung|    hd|  599|
|        4|         tv|samsung|   uhd|  799|
|        5|      phone| iphone|     x|  999|
|        6|      phone| iphone|    11|  999|
|        7|      phone|samsung|    10|  899|
|        8|      phone|samsung|10note|  999|
|        9|      phone|  pixel|     4|  799|
|       10|      phone|  pixel|     3|  699|
+---------+-----------+-------+------+-----+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.