0

So I have a file named testingFruits.csv with the following columns:

name,value_id,size
apple,1,small
mango,2,small
banana,3,medium
watermelon,4,large

I also have an associative array that stores the following data:

fruitSizes[apple] = xsmall
fruitSizes[mango] = small
fruitSizes[banana] = medium
fruitSizes[watermelon] = xlarge

Is there anyway I can update the 'size' column within the file based on the data within the associative array for each value in the 'name' column?

I've tried using awk but I had no luck. Here's a sample of what I tried to do:

awk -v t="${fruitSizes[*]}" 'BEGIN{n=split(t,arrayval,""); ($1 in arrayval) {$3=arrayval[$1]}' "testingFruits.csv"

I understand this command would get the bash defined array fruitSizes, do a split on all the values, then check if the first column (name) is within the fruitSizes array. If it is, then it would update the third column (size) with the value found in fruitSizes for that specific name.

Unfortunately this gives me the following error:

Argument list too long

This is the expected output I'd like in the same testingFruits.csv file:

name,value_id,size
apple,1,xsmall
mango,2,small
banana,3,medium
watermelon,4,xlarge

One edge case I'd like to handle is the presence of duplicate values in the name column with different values for the value_id and size columns.

1
  • FYI having a bash associative array as your starting point is probably a bad idea as they're slow and non-portable and make the rest of your script harder to implement, you should instead be using awk to read whatever input you're populating that array from. Commented Sep 9, 2021 at 20:22

1 Answer 1

1

If you want to stick to an awk script, pass the array via stdin to avoid running into ARG_MAX issues.

Since your array is associative, listing only the values ${fruitSizes[@]} is not sufficient. You also need the keys ${!fruitSizes[@]}. pr -2 can pair the keys and values in one line.
This assumes that ${fruitSizes[@]} and ${!fruitSizes[@]} expand in the same order, and your keys and values are free of the field separator (, in this case).

printf %s\\n "${!fruitSizes[@]}" "${fruitSizes[@]}" | pr -t -2 -s, |
awk -F, -v OFS=, 'NR==FNR {a[$1]=$2; next} $1 in a {$3=a[$1]} 1' - testingFruits.csv

However, I'm wondering where the array fruitSizes comes from. If you read it from a file or something like that, it would be easier to leave out the array altogether and do everything in awk.

Sign up to request clarification or add additional context in comments.

10 Comments

What is the significance of {a[$1]=$2; next} ?
@confusedcoder21 a is the awk-version of fruitSizes. We need the mentioned rule to populate that array. NR==FNR activates only at the first "file" (- stands for stdin). a[$1]=$2 stores the keys ($1) and values ($2) from fruitSizes in the awk-array a. next skips all other awk rules and goes to the next line. Therefore, the part $1 in a {$3=a[$1]} 1 is only executed for the 2nd file. 1 is a shorthand for {print}.
So $1 refers to the keys in the fruitSizes and $2 refers to the values. And later $3 refers to the size column in the file?
How would this change if I had more columns? Say the name column was column #7 and the size column is #13?
Exactly. The meaning of the columns changes from "file" (stdin -) to file (testingFruits.csv). If you want to adapt this script, just ignore the part NR==FNR {a[$1]=$2; next}. That's just the "magic" initialization of the array. What comes after that can be altered however you like. If the name is $7 and the size is $13, use $7 in a {$13=a[$7]} 1. The part before that stays the same.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.