2

I am trying to run a command from java code two merge to files! The command is:

hadoop fs -cat /user/clouder/Index_1/part-r-00000 /user/cloudera/Index_2/part-r-00000 | hadoop fs -put - /user/cloudera/mergedfile

The command runs perfectly on Cloudera terminal, but when I run the same from java code, it shows the merged content on console but does not create the mergedfile in specified path on HDFS. If the mergedfile is already existing then it outputs the earlier data of the file but not the newly merged data and if the file does not exist then it does not create a new file. Where as the above command running on terminal creates new file if not existing else gives error of file exists.

My java code is as follows:

process p;

try{

        p =Runtime.getRuntime().exec("hadoop fs -cat /user/cloudera/Index_1/part-r-00000 /user/cloudera/Index_2/part-r-00000 | hadoop fs -put - /user/cloudera/mergedfile");
        BufferredReader br=new BufferedReader(new InputStreamReader(p.getInputStream()));

        while(s=br.readLine())!=null)
        {
            System.out.println(s);
        }
    }

catch(Exception e)
    {
        System.out.println(e.getMessage());
    }

My purpose is replace if there is an existing file or create a new file if not existing from java code.

1
  • Why didn't you use the HDFS API? Commented Nov 16, 2015 at 18:49

1 Answer 1

1

To run HDFS commands with Java you should be using the HDFS Java API. Here is a code sample from javased.com of how to use it to merge this files:

/** 
 * @param inputFiles a glob expression of the files to be merged
 * @param outputFile a destination file path
 * @param deleteSource delete source files after merging
 * @return
 * @throws IOException
 */
private static Path mergeTextFiles(String inputFiles,String outputFile,boolean deleteSource,boolean deleteDestinationFileIfExist) throws IOException {
  JobConf conf=new JobConf(FileMerger.class);
  FileSystem fs=FileSystem.get(conf);
  Path inputPath=new Path(inputFiles);
  Path outputPath=new Path(outputFile);
  if (deleteDestinationFileIfExist) {
    if (fs.exists(outputPath)) {
      fs.delete(outputPath,false);
      sLogger.info("Warning: remove destination file since it already exists...");
    }
  }
 else {
    Preconditions.checkArgument(!fs.exists(outputPath),new IOException("Destination file already exists..."));
  }
  FileUtil.copyMerge(fs,inputPath,fs,outputPath,deleteSource,conf,FILE_CONTENT_DELIMITER);
  sLogger.info("Successfully merge " + inputPath.toString() + " to "+ outputFile);
  return outputPath;
}

I this case you would need to copy the files you want to merge into 1 directory beforehand using the FileUtil class. Later you would take such directory path and pass it as the inputFiles parameter.:

JobConf conf=new JobConf(FileMerger.class);
FileSystem fs=FileSystem.get(conf);
String tmpDir = "/user/cloudera/tmp_dir";
Path[] paths = {new Path("/user/clouder/Index_1/part-r-00000"), new Path("/user/clouder/Index_2/part-r-00000")};
Path pathToInputs = FileUtil.copy(fs, paths, fs, new Path(tmpDir));
mergeTextFiles(tmpDir, "/user/cloudera/mergedfile", false, true);
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Rosales, I was looking into the same, Never use HDFS API in java before! I will try this right now, Thanks for the code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.