3

This is the content of list.csv:

Apple,Red,10
Banana,Yellow,3
Coconut,White,18

Suppose I have this GNU parallel command:

parallel -a list.csv -j0 -C, \
color=`echo {2} | sed 's/e/eee/g' | ./capitalize.sh` ";" \
echo "{2}" ";" \
echo "$color" ";"

To get:

Red
REEED
Yellow
YEEELLOW
White
WHITEEE

Why isn't the color variable being defined/printed?

EDIT 20151218: Now that I got the quoting right, I'd like to introduce a function reading a variable from another function, and reading $0.

This is a working example without GNU parallel (I made grep case-insensitive before posting, to facilitate testing without ./capitalize.sh).

while read line; do
doit() {
   color=`echo $1 | cut -d, -f2 | sed 's/e/eee/g' | ./capitalize.sh`
}
export -f doit

get_key() {
   key=`grep -i $color $0 | cut -d, -f2`
}
export -f get_key
                   #note that I would use parallel's `-C,` here instead of `cut`.
  doit $line       #get CSV's 2nd element and make it look like the one in script.
  get_key          #extract this element's value from the script's comments.
  echo "color: $color"
  echo "key: $key"
done < list.csv

#Key database in the shell script
# REEED,r-key
# YEEELLOW,y-key
# WHITEEE,w-key

Working output:

color: REEED
key: r-key
color: YEEELLOW
key: y-key
color: WHITEEE
key: w-key
15
  • This is so wrong I'm not sure that starting with the existing code is reasonable. You might consider backing up and describing what you're actually trying to accomplish. Commented Dec 17, 2015 at 21:12
  • (and why parallel is being a part of that goal -- in general, if you want output in a well-defined order, parallel is not typically an appropriate tool for the job, since there's no guarantee that one task it spawns won't start printing in the middle of the output of another one; to avoid that would need the use of -k / --keep-order to buffer and reassemble output). Commented Dec 17, 2015 at 21:14
  • ...anyhow, if you're only using parallel because its support for CSV input, there are far, far better ways to do that in shell without it. (And everything else -- you can do the string manipulations and capitalization using only shell builtins much, much more efficiently than starting external tools like sed). Commented Dec 17, 2015 at 21:17
  • 2
    @CharlesDuffy May I challenge you to write this: parallel -a list.csv -j0 -C, echo {3}";" echo '{=2 s/e/eee/g; s/\b(\w)/uc($1)/ge =}' using xargs -P ... ? Commented Dec 17, 2015 at 21:30
  • 2
    @CharlesDuffy Yes, but you will need to make sure there is no mixing of outputs from different jobs (i.e. if you run 100 jobs in parallel 'echo {3}' must be followed by the second echo and not by 'echo {3}' from another job). Commented Dec 17, 2015 at 21:41

2 Answers 2

4

This should work:

parallel -a list.csv -j0 -C, 'color=`echo {2} | sed "s/e/eee/g" | ./capitalize.sh`' ";" echo "{2}" ";" echo '"$color"' ";"

You are being hit by inadequate quoting. It might be easier to use a function:

doit() {
   color=`echo $2 | sed 's/e/eee/g' | ./capitalize.sh`
   echo "$2"
   echo "$color"
}
export -f doit
parallel -a list.csv -j0 -C, doit

If this is the real goal you might want to use {= =} instead which is made for similar situations:

parallel -a list.csv -j0 -C, echo {2}";" echo '{=2 s/e/eee/g; $_=uc($_) =}'

If you are using $color several times, then --rpl can introduce a shorthand:

parallel --rpl '{clr} s/e/eee/g; $_=uc($_)' -a list.csv -j0 -C, echo {2}";" echo '{2clr} and again: {2clr}'

From the xargs afficionados I would really like to see a solution using xargs that:

  • guarantees not mixing output from different jobs - even if the lines are 60k long (e.g. the value of $color is 60k long)
  • sends stdout to stdout, and stderr to stderr
  • does not skip jobs even if the list of jobs (list.csv) is bigger than the number of available processes in the process table - even if capitalize.sh takes a full minute to run (xargs -P0)
Sign up to request clarification or add additional context in comments.

3 Comments

Your first suggestion works perfectly. I had less luck with the function, because I don't want it to print the result right away: it will be printed and referenced later in the code. As we are going deeper down the rabbit hole, I now have a problem reading this variable from within another variable. See the updated question.
@octosquidopus It is easier to see the solution if you show us the working code before putting it into GNU Parallel.
I updated my question with an example, and I'm rewriting my GNU parallel command to use functions, because my nested quotes are now close to being unmanageable).
1

The idea is to use a single function to do everything.

#!/bin/bash

#Key database in the shell script
# REEED,r-key
# YEEELLOW,y-key
# WHITEEE,w-key

doit() {
  # get CSV's 2nd element and make it look like the one in script.
  color=`echo $3 | cut -d, -f2 | sed 's/e/eee/g' | ./capitalize.sh`
  #extract this element's value from the script's comments.
  key=`grep -i $color $1 | cut -d, -f2`
  echo "color: $color"
  echo "key: $key"
}
export -f doit

#note that I would use parallel's `-C,` here instead of `cut`.
parallel -C, doit $0 < list.csv

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.