I when I type the following command into cygwin:
bin/nutch index crawl/crawldb crawl/linkdb crawl/segment/*
then the binary works fine. When I place the exact same line into my bash script:
#!/bin/bash/
bin/nutch index crawl/crawldb crawl/linkdb crawl/segment/*
I get an error saying some files don't exist. This may be specific to Nutch which is the program I'm running, but I think it has more to do with how I'm calling the command in the script. Any ideas about what's wrong and how to fix this? (yes I'm using tab completion)
EDIT:
Script:
#!/bin/bash
/home/Dan/apache-nutch-1.2/bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*
I run the command:
$ pwd
/home/Dan/apache-nutch-1.2
$ ./nutch.sh
The output I'm getting is:
Indexer: starting at 2010-11-29 15:15:44
Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/crawl_fetch
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/crawl_parse
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/parse_data
Input path does not exist: file:/C:/cygwin/home/Dan/apache-nutch-1.2/
/parse_text
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)
Regards, ~DS
/bin/bashis the correct path. Also,bin/nutchis a relative path.