Hadoop -pseudo distributed mode : Input path does not exist

Question

I am newbie in Hadoop.. I just ran my hadoop application in a stand alone mode. It worked just fine. I now decided to move it to pseudo distributed mode. I made the configuration changes as mentioned . Snippets of my xml files are shown:

my core-site.xml looks as follows :

<name>fs.default.name</name>
<value>hdfs://localhost/</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop-onur</value>
    <description>A base for other temporary directories.</description>
   </property>

my hdfs-site.xml is

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

and my mapred.xml is

<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>

I ran the scripts for start-dfs.sh and start-mapred.sh and it started fine

root@vissu-desktop:/home/vissu/Raveesh/Hadoop# start-dfs.sh 
starting namenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-vissu-desktop.out
localhost: starting datanode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-vissu-desktop.out
localhost: starting secondarynamenode, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-vissu-desktop.out
root@vissu-desktop:/home/vissu/Raveesh/Hadoop# start-mapred.sh 
starting jobtracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-vissu-desktop.out
localhost: starting tasktracker, logging to /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-vissu-desktop.out
root@vissu-desktop:/home/vissu/Raveesh/Hadoop#

Now i tried to run my application: But got the following error.

root@vissu-desktop:/home/vissu/Raveesh/Hadoop/hadoop-0.20.2# hadoop jar ResultAgg_plainjar.jar ProcessInputFile /home/vissu/Raveesh/VotingConfiguration/sample.txt 
ARG 0 obtained = ProcessInputFile
12/07/17 17:43:33 INFO preprocessing.ProcessInputFile: Modified File Name is /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
Going to process map reduce jobs
12/07/17 17:43:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/07/17 17:43:34 ERROR preprocessing.ProcessInputFile: Input path does not exist: hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf
root@vissu-desktop:/home/vissu/Raveesh/Hadoop/hadoop-0.20.2#

The application initially takes in a file from a path then modifies it and creates a sample.txt_modf and this file has to be used by the map reduce framework. When running in the standalone mode i had given the absolute path and hence it was fine. But i am unable to figure out what is the path is should specify in the Path api for hadoop.. If i give the file it adds the hdfs://localhost/ .. So i am unsure of how to give the path in the pseudo distributed mode.. should i simply make sure that the modified file is created in that location..

My query is on how to mention the path..

Snippet containing the path is

        KeyValueTextInputFormat.addInputPath(conf,
                new Path(System.getProperty("user.dir")+File.separator+inputFileofhits.getName()));
        FileOutputFormat.setOutputPath(
                conf,
                new Path(ProcessInputFile.resultAggProps
                        .getProperty("OUTPUT_DIRECTORY")));

Thanks

Chris White · Accepted Answer · 2012-07-17 10:51:25Z

5

Does this file exist in HDFS? It looks like you've provided a local path to the file (user directories in HDFS are usually rooted at /user rather than /home.

You can check the file exists in HDFS by typing:

#> hadoop fs -ls hdfs://localhost/home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf

If this returns nothing, i.e. the file is not in HDFS, then you can copy to HDFS again using the hadoop fs command:

#> hadoop fs -put /home/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf hdfs://localhost/user/vissu/Raveesh/Hadoop/hadoop-0.20.2/sample.txt_modf

Note here the path in HDFS is rooted at /user, not /home.

answered Jul 17, 2012 at 10:51

Chris White

30.1k4 gold badges75 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Raveesh Sharma Over a year ago

Hi,.. thanks.. i hadnt check ed that.. how do i do this programatically so that when ever the modified file is generated to should put it the hdfs location?

Chris White Over a year ago

you could add some logic into your driver code (where you create and submit your job from) to check / compare the local file timestamp against the one in HDFS and replace if newer.

Collectives™ on Stack Overflow

Hadoop -pseudo distributed mode : Input path does not exist

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related