0

I need to update a mutable list with the content of a directory in HDFS, I have the following code witch in spark-shell works but inside an script it doesn't:

import org.apache.hadoop.fs._
import org.apache.spark.deploy.SparkHadoopUtil

var listOfFiles= scala.collection.mutable.ListBuffer[String]()

val hdfs_conf = SparkHadoopUtil.get.newConfiguration(sc.getConf)
    val hdfs = FileSystem.get(hdfs_conf)
    val sourcePath = new Path(filePath)  

 hdfs.globStatus( sourcePath ).foreach{ fileStatus =>
      val filePathName = fileStatus.getPath().toString();
      val fileName = fileStatus.getPath().getName();
      listOfFiles.append(fileName)
  } 

listOfFiles.tail

any help, when running it launches an exception telling that listOfFiles is empty.

6
  • What exception do you get when you write it in a scala file? Commented Jun 2, 2016 at 13:22
  • the exception is that the listOfiles is empty Commented Jun 2, 2016 at 13:23
  • 1
    Nothing wrong on scala aspect I guess, maybe check the hdfs.globStatus ( ... ) part again Commented Jun 2, 2016 at 13:34
  • Most probably your hdfc.globalStatus(sourcePath) is not returning anything Commented Jun 2, 2016 at 13:36
  • why is it that in spark-shell it works and in an script it doesn't? Commented Jun 2, 2016 at 13:36

1 Answer 1

2

You should avoid using mutable collection.

Try:

val listOfFiles = hdfs.globStatus(sourcePath).map{ fileStatus =>
      fileStatus.getPath().getName();
  }
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.