NULL Pointer Exception, while creating DF inside foreach()

Question

I have to read certain files from S3, so I created a CSV containing path of those files on S3. I am reading created CSV file using below code:

val listofFilesRDD = sparkSession.read.textFile("s3://"+ file)

This is working fine. Then I am trying to read each of those paths and create dataframe like:

listofFilesRDD.foreach(iter => {
  val pathDF = sparkSession.read
    .schema(testSchema)
    .option("headers", true)
    .csv("s3://"+iter)

  pathDF.printSchema()
})

but, the above code gives NullPointerException.

So, How can I fix the above code?

Sandeep Purohit · Accepted Answer · 2016-10-20 17:07:15Z

4

You can solve the above problem as below you simple create Array of s3 file paths and iterate over that array and create DF inside that as below

val listofFilesRDD = sparkSession.read.textFile("s3://"+ file)
val listOfPaths = listofFilesRDD.collect()

    listOfPaths.foreach(iter => {
    val pathDF = sparkSession.read
    .schema(testSchema)
    .option("headers", true)
    .csv("s3://"+iter)


pathDF.printSchema()
})

answered Oct 20, 2016 at 17:07

Sandeep Purohit

3,71222 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:00:40Z

2

You cannot access a RDD inside a RDD ! Thats the sole rule ! You have to do something else to make your logic work !

You can find more about it here : NullPointerException in Scala Spark, appears to be caused be collection type?

edited May 23, 2017 at 12:00

CommunityBot

11 silver badge

answered Oct 20, 2016 at 16:44

Shivansh

3,55426 silver badges46 bronze badges

Comments

Beyhan Gul · Accepted Answer · 2018-09-13 10:16:46Z

2

If anyone encounter DataFrame problem , can solve this problem.

   def parameterjsonParser(queryDF:DataFrame,spark:SparkSession): Unit ={
        queryDF.show()
        val otherDF=queryDF.collect()
        otherDF.foreach { row =>
          row.toSeq.foreach { col =>
            println(col)
            mainJsonParser(col.toString,spark)
          }
        }

Thank you @Sandeep Purohit

answered Sep 13, 2018 at 10:16

Beyhan Gul

1,2591 gold badge16 silver badges26 bronze badges

Collectives™ on Stack Overflow

NULL Pointer Exception, while creating DF inside foreach()

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related