0

I want to write text files into HDFS. The path to which files has to be written to HDFS is dynamically generated. If a file path(including file name) is new, then the file should be created and text should be written to it. If the file path(including file) already exists, then the string must be appended to the existing file.

I used the following code. File creation is working fine. But cannot append text to existing files.

def writeJson(uri: String, Json: JValue, time: Time): Unit = {
    val path = new Path(generateFilePath(Json, time))
    val conf = new Configuration()
    conf.set("fs.defaultFS", uri)
    conf.set("dfs.replication", "1")
    conf.set("dfs.support.append", "true")
    conf.set("dfs.client.block.write.replace-datanode-on-failure.enable","false")

    val Message = compact(render(Json))+"\n"
    try{
      val fileSystem = FileSystem.get(conf)
      if(fileSystem.exists(path).equals(true)){
        println("File exists.")
        val outputStream = fileSystem.append(path)
        val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
        bufferedWriter.write(Message.toString)
        bufferedWriter.close()
        println("Appended to file in path : " + path)
      }
      else {
        println("File does not exist.")
        val outputStream = fileSystem.create(path, true)
        val bufferedWriter = new BufferedWriter(new OutputStreamWriter(outputStream))
        bufferedWriter.write(Message.toString)
        bufferedWriter.close()
        println("Created file in path : " + path)
      }
    }catch{
      case e:Exception=>
        e.printStackTrace()
    }
  }

Hadoop version : 2.7.0

Whenever append has to be done, the following error is generated:

org.apache.hadoop.ipc.RemoteException(java.lang.ArrayIndexOutOfBoundsException)

1 Answer 1

1

I can see 3 possibilities:

  1. probably the easiest is to use external commands provided by hdfs which is sitting on your Hadoop cluster, see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html . Or even WebHDFS REST functionality: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
  2. If you don't want to use hdfs commnads, then you might use hdfs API provided by hadoop-hdfs library http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.1
  3. Use Spark, if you want clean Scala solution, e.g. http://spark.apache.org/docs/latest/programming-guide.html or https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter3/save_the_rdd_to_files.html
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.