0

I when I type the following command into cygwin:

bin/nutch index crawl/crawldb crawl/linkdb crawl/segment/* 

then the binary works fine. When I place the exact same line into my bash script:

#!/bin/bash/
bin/nutch index crawl/crawldb crawl/linkdb crawl/segment/*

I get an error saying some files don't exist. This may be specific to Nutch which is the program I'm running, but I think it has more to do with how I'm calling the command in the script. Any ideas about what's wrong and how to fix this? (yes I'm using tab completion)

EDIT:

Script:

#!/bin/bash
/home/Dan/apache-nutch-1.2/bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*

I run the command:

$ pwd
/home/Dan/apache-nutch-1.2
$ ./nutch.sh

The output I'm getting is:

Indexer: starting at 2010-11-29 15:15:44
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/crawl_fetch
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/crawl_parse
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/parse_data
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/parse_text
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
    at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
    at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
    at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)

Regards, ~DS

6
  • make sure /bin/bash is the correct path. Also, bin/nutch is a relative path. Commented Nov 29, 2010 at 19:58
  • I have tested other commands in the script using this bin/nutch directory and they have worked fine. I take this to mean that these are all okay. What is the difference between running a command in a script and on the command line? Is there any way to bridge the gap between the 2 completely? Commented Nov 29, 2010 at 20:01
  • Can you post the script and o/p you are seeing ? Commented Nov 29, 2010 at 20:06
  • Okay, I think the issue is that the command I'm running generates temporary directories. And when I call the command in a script it assumes they already exist. Is this true? Commented Nov 29, 2010 at 20:39
  • I haven't solved this yet. Still getting the error. Any ideas? Commented Nov 29, 2010 at 21:47

1 Answer 1

1

Two things:

  1. You've got a trailing slash after "bash" in the shebang at the start of the script -- remove it, it should just read #!/bin/bash. Also double check there is a bash in /bin.
  2. The script will try and execute nutch from the bin directory in your currect folder. So if you're in $HOME, and assuming you've got a path $HOME/bin/nutch, then you'll be okay. But then if you change to /tmp, then it'll fail as there's no such path as /tmp/bin/nutch. You're better off giving the full absolute path name to nutch in the first place.
Sign up to request clarification or add additional context in comments.

1 Comment

Actually, I'm running the script from home/.../apache-nutch-1.2/ which contains "/bin/nutch". I fixed the #!/bin/bash. Perhaps the fact that I'm calling the script from a relative address is causing the issue? How would I fix that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.