3

i need to run hadoop command in bash script, which go through bunch of folders on amazon S3, then write those folder names into a txt file, then do further process. but the problem is when i ran the script, seems no folder names were written to txt file. i wonder if it's the hadoop command took too long to run and the bash script didn't wait until it finished and go ahead to do further process, if so how i can make bash wait until the hadoop command finished then go do other process?

here is my code, i tried both way, neither works:

1. 
listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME"                            
echo -e "listing... $listCmd\n"                                                                                                                                                   
eval $listCmd
...other process ...

2. 
echo -e "list the folders we want to copy into a file"
hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME
... other process ....

any one knows what might be wrong? and is it better to use the eval function or just use the second way to run hadoop command directly

thanks.

2
  • Does "hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate" have any output? Commented Oct 3, 2013 at 0:05
  • yes, when run that hadoop command directly, it ran well Commented Oct 3, 2013 at 5:17

1 Answer 1

2

I would prefer to eval in this case, prettier to append the next command to this one. and I would rather break down listCmd into parts, so that you know there is nothing wrong at the grep, awk or cut level.

listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate > $raw_File"
gcmd="cat $raw_File | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME"
echo "Running $listCmd and other commands after that"
otherCmd="cat $FILE_NAME"
eval "$listCmd";
echo $?  # This will print the exit status of the $listCmd
eval "$gcmd" && echo "Finished Listing" && eval "$otherCmd"

otherCmd will only be executed if $gcmd succeeds. If you have too many commands that you need to execute, then this becomes a bit ugly. If you roughly know how long it will take, you can insert a sleep command.

 eval "$listCmd"
 sleep 1800  # This will sleep 1800 seconds
 eval "$otherCmd"
Sign up to request clarification or add additional context in comments.

1 Comment

Using eval is the only way I was able to start the Hadoop streaming using a bash script

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.