How to append text files in HDFS using Hadoop client using Scala?

Question

I want to write text files into HDFS. The path to which files has to be written to HDFS is dynamically generated. If a file path(including file name) is new, then the file should be created and text should be written to it. If the file path(including file) already exists, then the string must be appended to the existing file.

I used the following code. File creation is working fine. But cannot append text to existing files.

def writeJson(uri: String, Json: JValue, time: Time): Unit = {
    val path = new Path(generateFilePath(Json, time))
    val conf = new Configuration()
    conf.set("fs.defaultFS", uri)
    conf.set("dfs.replication", "1")
    conf.set("dfs.support.append", "true")
    conf.set("dfs.client.block.write.replace-datanode-on-failure.enable","false")

    val Message = compact(render(Json))+"\n"
    try{
      val fileSystem = FileSystem.get(conf)
      if(fileSystem.exists(path).equals(true)){
        println("File exists.")
        val outputStream = fileSystem.append(path)
        val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
        bufferedWriter.write(Message.toString)
        bufferedWriter.close()
        println("Appended to file in path : " + path)
      }
      else {
        println("File does not exist.")
        val outputStream = fileSystem.create(path, true)
        val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
        bufferedWriter.write(Message.toString)
        bufferedWriter.close()
        println("Created file in path : " + path)
      }
    }catch{
      case e:Exception=>
        e.printStackTrace()
    }
  }

Hadoop version : 2.7.0

Whenever append has to be done, the following error is generated:

org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException)

xhudik · Accepted Answer · 2015-12-30 13:16:51Z

1

I can see 3 possibilities:

probably the easiest is to use external commands provided by hdfs which is sitting on your Hadoop cluster, see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html . Or even WebHDFS REST functionality: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
If you don't want to use hdfs commnads, then you might use hdfs API provided by hadoop-hdfs library http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1
Use Spark, if you want clean Scala solution, e.g. http://spark.apache.org/docs/latest/programming-guide.html or https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter3/save_the_rdd_to_files.html

answered Dec 30, 2015 at 13:16

xhudik

2,4421 gold badge24 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to append text files in HDFS using Hadoop client using Scala?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related