run hadoop command in bash script

Question

i need to run hadoop command in bash script, which go through bunch of folders on amazon S3, then write those folder names into a txt file, then do further process. but the problem is when i ran the script, seems no folder names were written to txt file. i wonder if it's the hadoop command took too long to run and the bash script didn't wait until it finished and go ahead to do further process, if so how i can make bash wait until the hadoop command finished then go do other process?

here is my code, i tried both way, neither works:

1. 
listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME"                            
echo -e "listing... $listCmd\n"                                                                                                                                                   
eval $listCmd
...other process ...

2. 
echo -e "list the folders we want to copy into a file"
hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME
... other process ....

any one knows what might be wrong? and is it better to use the eval function or just use the second way to run hadoop command directly

thanks.

Does "hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate" have any output? — interskh
– interskh, Commented Oct 3, 2013 at 0:05

iamauser · Accepted Answer · 2013-10-03 02:37:12Z

2

I would prefer to eval in this case, prettier to append the next command to this one. and I would rather break down listCmd into parts, so that you know there is nothing wrong at the grep, awk or cut level.

listCmd="hadoop fs -ls s3n://$AWS_ACCESS_KEY:$AWS_SECRET_KEY@$S3_BUCKET/*/*/$mydate > $raw_File"
gcmd="cat $raw_File | grep s3n | awk -F' ' '{print $6}' | cut -f 4- -d / > $FILE_NAME"
echo "Running $listCmd and other commands after that"
otherCmd="cat $FILE_NAME"
eval "$listCmd";
echo $?  # This will print the exit status of the $listCmd
eval "$gcmd" && echo "Finished Listing" && eval "$otherCmd"

otherCmd will only be executed if $gcmd succeeds. If you have too many commands that you need to execute, then this becomes a bit ugly. If you roughly know how long it will take, you can insert a sleep command.

 eval "$listCmd"
 sleep 1800  # This will sleep 1800 seconds
 eval "$otherCmd"

edited Oct 3, 2013 at 2:37

answered Oct 3, 2013 at 2:17

iamauser

11.6k6 gold badges40 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jesuisme Over a year ago

Using eval is the only way I was able to start the Hadoop streaming using a bash script

Collectives™ on Stack Overflow

run hadoop command in bash script

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related