0

I have have this bash script that I made which makes an api request for a very big list of user accounts (almost 10,000)

#!/bin/bash

#Variables
Names_list="/home/debian/names.list"
auth_key="***"

#For loop
Users=$(cat $Names_list)
for n in ${Users[@]}
do
        curl --silent --request GET \
        --url https://api.example.com/$n \
        --header 'authorization: Bearer '$auth_key'' \
        --data '{}' >> /home/debian/results.list
done

echo "Done."

My pain is that with the current way of working my bearer token expires before the calls can complete. It only has a 30 minute lifetime and it the calls start returning an unauthorized error at around seven to eight thousand.

I understand that I can just split up the big list file with something like "split" and then set the script to background the task with &, but I cannot wrap my head around that part.

Since the API I am using is private and has no rate limiting, I was thinking of bursting the ~10,000 calls in batches of 1 or 2 thousand.

Like this:

#!/bin/bash

cat_split(){
   cat $file;
}

Split_results="/home/debian/split.d/"

for file in ${Split_results[@]}
do
        cat_split &
done

Yes, that does work as a poc, but I don't know what the best way of going around this is now. Should I place in my api call in another function or have one function that does the cat and then the api call? What would you consider a proper way of going around this?

Thanks for any advice in advance.

14
  • 1
    Why not just use GNU Parallel? Type [gnu-parallel] in the Search box above. Commented Jan 15, 2022 at 16:52
  • @MarkSetchell Yes, I did see that when I was searching over stack, but I ultimately decided that I wouldn't use it since I don't really want to install anything. My user account has no sudo rights anyway. Commented Jan 15, 2022 at 17:12
  • 1
    You don't need sudo rights and it's just a Perl script such as you might write yourself. I don't have any axe to grind but you may like to read this. oletange.blogspot.com/2013/04/why-not-install-gnu-parallel.html Commented Jan 15, 2022 at 17:14
  • 1
    What curl --version do you use? curl has --parallel option since version 7.66.0. The list of urls can be given to curl using -K (though not sure if 10,000 would work). Commented Jan 15, 2022 at 17:49
  • 1
    I have curl version 7.74.0 on my computer, and thanks for that tip. I had no clue that curl had such an option. @rowboat Commented Jan 15, 2022 at 17:55

1 Answer 1

2

I understand that I can just split up the big list file with something like "split" and then set the script to background the task with &, but I cannot wrap my head around that part.

Put the stuff to execute in a function. Then use GNU parallel or xargs.

doit() {
   # command in an array for comments
   cmd=(
        curl --silent --request GET
        --url "https://api.example.com/$1"
        # check your scripts with shellcheck
        --header "authorization: Bearer $auth_key"
        --data '{}'
   )
   # execute it and store in variable
   tmp=$("{$cmd[@]}")
   # output in a single call, so that hopefully buffering does not bite us that much
   # if it does, use a separate file for each call
   # or use GNU parallel
   printf "%s\n" "$tmp"
}
export auth_key   # export needed variables
export -f doit    # export needed functions

# I replaced /home/debian by ~
# run xargs that runs bash that runs the command with passed argument
# see man xargs , man bash
xargs -d '\n' -P 1000 -n 1 bash -c 'doit "$@"' _ < ~/names.list > ~/results.list

of bursting the ~10,000 calls in batches of 1 or 2 thousand.

You would have to write a manual loop for that:

trap 'kill $(jobs -p)' EXIT  # kill all jobs on ctrl+c
n=0
max=1000
# see https://mywiki.wooledge.org/BashFAQ/001
while IFS= read -r line; do
   if ((++n > max)); then
       wait  # wait for all the currently running processes
       n=0
   fi
   doit "$line" &
done < ~/names.list > ~/results.list
wait

Problems with your script:

  • Use shellcheck to check your scripts
  • do not use for i in $(cat..) also do not use it in the form of tmp=$(cat ..); for i in $tmp. See https://mywiki.wooledge.org/BashFAQ/001
  • Users is not an array
  • $n and $auth_key are not quoted. In particular, auth_key="***" is replaced by all your files in current directory.
  • your second script does not wait for them.
Sign up to request clarification or add additional context in comments.

1 Comment

Upvote for "output in a single call, so that hopefully buffering does not bite us that much".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.