2

I'm trying to create an array in BASH with the following command:

# Normally a loop, but for this example, n=1
arr=( `awk 'NR == n' n=1 "$file" | tr ',' " "` );
echo ${arr[@]};
echo ${arr[2]};

$file contains:

"1","test_id = \"5\"","test_id <> \"5\""
"2","test_id = \"6\"","test_id <> \"6\""

Output:

"1" "test_id = \"5\"" "test_id <> \"5\""
"test_id

I'm expecting output of ${arr[2]} to be test_id <> "5" but I actually get "test_id.

However, when I manually input the data like so, I get the correct result:

arr=( "1" "test_id = \"5\"" "test_id <> \"5\"" );
echo ${arr[@]};
echo ${arr[2]};

Output:

1 test_id = "5" test_id <> "5"
test_id <> "5"

Any help would be appreciated,

Thanks.

5
  • The shell splits on blanks — those that you add when you replace the commas and those that are already in the fields. Commented Nov 5, 2015 at 0:41
  • But how does that explain why the second command works? It contains blanks within the strings.. And it doesn't get split. Commented Nov 5, 2015 at 0:45
  • 2
    In the second case, the shell is interpreting the quotes. In the first, it does not. It's tricky, but that's the trouble. You'll also note I didn't quote a fix; that's because it isn't particularly easy to come up with one. In fact, given that you have nested double quotes in your data, and it's only a matter of time before there's a comma inside one of the fields, there really isn't a suitable way to deal with it in shell. At this point, you need a tool that handles CSV more sanely. That might be Python, or Perl, or a custom program such as csvfix. Commented Nov 5, 2015 at 0:49
  • 2
    I also observe that you've got a non-standard variant of CSV format. Normally, a double quote inside a field is represented by "" and not \". That's going to give some headaches too. What happens with commas and backslashes inside the field? I don't think your variant is too hard to handle, but because it doesn't match the default, you will need to tweak tools such as Python's CSV module to handle it properly. Just a heads-up. Commented Nov 5, 2015 at 1:01
  • I see. Thanks for your help! Commented Nov 5, 2015 at 2:24

1 Answer 1

1

Thew correct answer is of course in the comments: Use a real CSV-parser. That being said, this would be a contrived way of solving it:

. <(
      echo "arr=(";
      awk -F, 'NR==n && $1=$1' n=1 file
      echo ")"
)

Sourcing the result of this process substitution subtly changes the order in which the stages of quote interpretation happens. It often helps with these intricate issues. Ah, and to get rid of the unnnecesary tr call, awk is instructed to split on comma's. $1=$1 always results true, and causes awk to re-evaluate its line, so the input field separator is changed for the output field separator.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.