0

Ok so i have the following script to scrape contact details from a list of urls (urls.txt). When i run the following command direct from the terminal i get the correct result

perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' http://url.com 

however when i call the above command from within a script i get a "no such file or directory" result

Here is a copy of my script

#!/bin/bash

while read inputline
do
  //Read the url from urls.txt
  url="$(echo $inputline)"

  //execute saxon-lint to grab the contents of the XPATH from the url within urls.txt
  mydata=$("perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url ")

  //output the result in myfile.csv
  echo "$url,$mydata" >> myfile.csv

  //wait 4 seconds
  sleep 4

//move to the next url
done <urls.txt

i have tried changing the perl to ./ but get the same result

can anyone advise where i am going wrong with this please

The error that i am receiving is

./script2.pl: line 6: ./saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' http://find.icaew.com/listings/view/listing_id/20669/avonhurst-chartered-accountants : No such file or directory

Thanks in advance

13
  • Try using absolute path. Commented Aug 26, 2016 at 11:48
  • tried but the same result Commented Aug 26, 2016 at 11:49
  • use \ to escape the / Commented Aug 26, 2016 at 11:56
  • 3
    how is this perl within perl as claimed in the title? This is perl within bash. Commented Aug 26, 2016 at 11:56
  • i am new to linux so you will have to excuse the terminology used Commented Aug 26, 2016 at 12:06

2 Answers 2

6

Don't put double quotes inside the command substitution.

Not:

mydata=$("perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url ")
# .......^...........................................................................................^

But this:

mydata=$(perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url )

With the double quotes, you're instructing bash to look for a program named "perl saxon-lint.pl --html etc etc" in the path, spaces and all, and clearly no such program exists.

Sign up to request clarification or add additional context in comments.

1 Comment

You probably intended to put the quotes outside the command substitution, but for variable assignments they are not strictly necessary.
1

You should accept @glennjackman's answer, as that is exactly the problem. This line:

mydata=$("perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url ")

is telling the shell to run this command:

"perl saxon-lint.pl --html --xpath 'string-join(//div[2]/div[2]/div[1]/div[2]/div[2])' $url "

... including the double quotes. If you type that with the double quotes at the shell prompt, you'll get the same "No such file or directory" error message that you're getting from your script.

A couple other notes on the script:

  url="$(echo $inputline)"

This is a roundabout way of making a second variable into a copy of the first. A simple url=$intputline would work as well, but you could also just use read url in the first place. Not sure why you need two variables.

  //output the result in myfile.csv
  echo "$url,$mydata" >> myfile.csv

Be aware that when passing a variable containing user-supplied input as the first argument to echo, you create the possibility of unexpected behavior. In this case, it's a low possibility, since a URL isn't likely to start with a - character, but it's good to get out of the habit; I would use printf. Also, instead of appending each line inside the loop, I would just redirect the output of the loop along with the input:

  printf '%s,%s\n' "$url" "$mydata"
  [...]
done <urls.txt >>myfile.csv

If you don't expect myfile.csv to exist or have anything you need to keep at the top of the loop, you can change that to a single > and avoid the possibility of messy mixtures of output from different runs.

1 Comment

Yes, including them.. try typing "ls" with the quotes at the command line, and you'll get a directory listing. The shell doesn't distinguish between words and strings. All quotes do is keep you from having to escape characters (including spaces). To the shell, "ls" and ls (and 'ls' and \ls and l\s and \l\s...) are exactly the same thing (except the quoted versions won't trigger an alias lookup). If you try "ls -l", then you'll get an error because it's looking for a command named ls -l, instead of looking for one named ls and running it with an argument of -l.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.