4

I am newbie in Hadoop.. I just ran my hadoop application in a stand alone mode. It worked just fine. I now decided to move it to pseudo distributed mode. I made the configuration changes as mentioned . Snippets of my xml files are shown:

my core-site.xml looks as follows :

<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop-onur</value>
    <description>A base for other temporary directories.</description>
   </property>

my hdfs-site.xml is

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

and my mapred.xml is

<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>

I ran the scripts for start-dfs.sh and start-mapred.sh and it started fine

root@vissu-desktop:/home/vissu/Raveesh/Hadoop# start-dfs.sh 
starting namenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-vissu-desktop.out
localhost: starting datanode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-vissu-desktop.out
localhost: starting secondarynamenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-vissu-desktop.out
root@vissu-desktop:/home/vissu/Raveesh/Hadoop# start-mapred.sh 
starting jobtracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-vissu-desktop.out
localhost: starting tasktracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-vissu-desktop.out
root@vissu-desktop:/home/vissu/Raveesh/Hadoop# 

Now i tried to run my application: But got the following error.

root@vissu-desktop:/home/vissu/Raveesh/Hadoop/hadoop-0.20.2# hadoop jar ResultAgg_plainjar.jar ProcessInputFile /home/vissu/Raveesh/VotingConfiguration/sample.txt 
ARG 0 obtained = ProcessInputFile
12/07/17 17:43:33 INFO preprocessing.ProcessInputFile: Modified File Name is /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
Going to process map reduce jobs
12/07/17 17:43:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/07/17 17:43:34 ERROR preprocessing.ProcessInputFile: Input path does not exist: hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
root@vissu-desktop:/home/vissu/Raveesh/Hadoop/hadoop-0.20.2#

The application initially takes in a file from a path then modifies it and creates a sample.txt_modf and this file has to be used by the map reduce framework. When running in the standalone mode i had given the absolute path and hence it was fine. But i am unable to figure out what is the path is should specify in the Path api for hadoop.. If i give the file it adds the hdfs://localhost/ .. So i am unsure of how to give the path in the pseudo distributed mode.. should i simply make sure that the modified file is created in that location..

My query is on how to mention the path..

Snippet containing the path is

        KeyValueTextInputFormat.addInputPath(conf,
                new Path(System.getProperty("user.dir")+File.separator+inputFileofhits.getName()));
        FileOutputFormat.setOutputPath(
                conf,
                new Path(ProcessInputFile.resultAggProps
                        .getProperty("OUTPUT_DIRECTORY")));

Thanks

1 Answer 1

5

Does this file exist in HDFS? It looks like you've provided a local path to the file (user directories in HDFS are usually rooted at /user rather than /home.

You can check the file exists in HDFS by typing:

#> hadoop fs -ls hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf

If this returns nothing, i.e. the file is not in HDFS, then you can copy to HDFS again using the hadoop fs command:

#> hadoop fs -put /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf hdfs://localhost/user/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf

Note here the path in HDFS is rooted at /user, not /home.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi,.. thanks.. i hadnt check ed that.. how do i do this programatically so that when ever the modified file is generated to should put it the hdfs location?
you could add some logic into your driver code (where you create and submit your job from) to check / compare the local file timestamp against the one in HDFS and replace if newer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.