Update column in file based on associative array value in bash

Question

So I have a file named testingFruits.csv with the following columns:

name,value_id,size
apple,1,small
mango,2,small
banana,3,medium
watermelon,4,large

I also have an associative array that stores the following data:

fruitSizes[apple] = xsmall
fruitSizes[mango] = small
fruitSizes[banana] = medium
fruitSizes[watermelon] = xlarge

Is there anyway I can update the 'size' column within the file based on the data within the associative array for each value in the 'name' column?

I've tried using awk but I had no luck. Here's a sample of what I tried to do:

awk -v t="${fruitSizes[*]}" 'BEGIN{n=split(t,arrayval,""); ($1 in arrayval) {$3=arrayval[$1]}' "testingFruits.csv"

I understand this command would get the bash defined array fruitSizes, do a split on all the values, then check if the first column (name) is within the fruitSizes array. If it is, then it would update the third column (size) with the value found in fruitSizes for that specific name.

Unfortunately this gives me the following error:

Argument list too long

This is the expected output I'd like in the same testingFruits.csv file:

name,value_id,size
apple,1,xsmall
mango,2,small
banana,3,medium
watermelon,4,xlarge

One edge case I'd like to handle is the presence of duplicate values in the name column with different values for the value_id and size columns.

FYI having a bash associative array as your starting point is probably a bad idea as they're slow and non-portable and make the rest of your script harder to implement, you should instead be using awk to read whatever input you're populating that array from. — Ed Morton
– Ed Morton, Commented Sep 9, 2021 at 20:22

Socowi · Accepted Answer · 2021-09-09 19:57:40Z

1

If you want to stick to an awk script, pass the array via stdin to avoid running into ARG_MAX issues.

Since your array is associative, listing only the values ${fruitSizes[@]} is not sufficient. You also need the keys ${!fruitSizes[@]}. pr -2 can pair the keys and values in one line.
This assumes that ${fruitSizes[@]} and ${!fruitSizes[@]} expand in the same order, and your keys and values are free of the field separator (, in this case).

printf %s\\n "${!fruitSizes[@]}" "${fruitSizes[@]}" | pr -t -2 -s, |
awk -F, -v OFS=, 'NR==FNR {a[$1]=$2; next} $1 in a {$3=a[$1]} 1' - testingFruits.csv

However, I'm wondering where the array fruitSizes comes from. If you read it from a file or something like that, it would be easier to leave out the array altogether and do everything in awk.

edited Sep 9, 2021 at 19:57

answered Sep 9, 2021 at 19:09

Socowi

27.9k4 gold badges41 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

confusedcoder21 Over a year ago

What is the significance of {a[$1]=$2; next} ?

Socowi Over a year ago

@confusedcoder21 a is the awk-version of fruitSizes. We need the mentioned rule to populate that array. NR==FNR activates only at the first "file" (- stands for stdin). a[$1]=$2 stores the keys ($1) and values ($2) from fruitSizes in the awk-array a. next skips all other awk rules and goes to the next line. Therefore, the part $1 in a {$3=a[$1]} 1 is only executed for the 2nd file. 1 is a shorthand for {print}.

confusedcoder21 Over a year ago

So $1 refers to the keys in the fruitSizes and $2 refers to the values. And later $3 refers to the size column in the file?

confusedcoder21 Over a year ago

How would this change if I had more columns? Say the name column was column #7 and the size column is #13?

Socowi Over a year ago

Exactly. The meaning of the columns changes from "file" (stdin -) to file (testingFruits.csv). If you want to adapt this script, just ignore the part NR==FNR {a[$1]=$2; next}. That's just the "magic" initialization of the array. What comes after that can be altered however you like. If the name is $7 and the size is $13, use $7 in a {$13=a[$7]} 1. The part before that stays the same.

|

Collectives™ on Stack Overflow

Update column in file based on associative array value in bash

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related