3

The following function uses awk to convert a csv line to multiple lines. I then can assign the output to an array to be able to access the fields.

function csv_to_lines() {
echo $@ | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"]+\")";}
{for(i=1; i<=NF; i++) {printf("%s\n", $i)}}'
}

line='A,B,"C,D",E'
arr=($(csv_to_lines $line))

printf '%s,' "${arr[@]}"

However, this doesn't work for empty fields. For example:

line='A,,,,,"C,D",E'
arr=($(csv_to_lines $line))

printf '%s,' "${arr[@]}"

Outputs

A,"C,D",E,

But I expected

A,,,,,"C,D",E,

Evidently, all empty lines are ignored when assigning to the array. How do I create an array that keeps the empty lines?

1
  • In awk printf("%s\n", $i) = print $i. You also have a few shell errors in your script that copy/pasting it into shellcheck.net would tell you about. Commented Dec 14, 2021 at 13:23

1 Answer 1

5

Current code:

$ line='A,,,,,"C,D",E'
$ csv_to_lines $line
A




"C,D"
E

Looking at the actual characters generated we see:

$ csv_to_lines $line | od -c
0000000   A  \n  \n  \n  \n  \n   "   C   ,   D   "  \n   E  \n
0000016

As is the arr=(...) is going to split this data on white space and store the printable characters in the array, effectively doing the same as:

$ arr=(A
"C,D"
E)
$ typeset -p arr
declare -a arr=([0]="A" [1]="C,D" [2]="E")

$ printf '%s,' "${arr[@]}"
A,"C,D",E,

A couple ideas for storing the 'blank lines' in the array:

Use mapfile to read each line into the array, eg:

$ mapfile -t arr < <(csv_to_lines $line)
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")

Or have awk use something other than \n as a delimiter, then define a custom IFS to parse the function results into the array, eg:

$ function csv_to_lines() { echo $@ | awk '
BEGIN {FPAT = "([^,]*)|(\"[^\"]+\")";}
{for(i=1; i<=NF; i++) {printf("%s|", $i)}}'; }

$ csv_to_lines $line
A|||||"C,D"|E|

$ IFS='|' arr=($(csv_to_lines $line))
$ typeset -p arr
declare -a arr=([0]="A" [1]="" [2]="" [3]="" [4]="" [5]="\"C,D\"" [6]="E")

Both of these lead to:

$ printf '%s,' "${arr[@]}"
A,,,,,"C,D",E,
Sign up to request clarification or add additional context in comments.

1 Comment

mapfile solution is perfect. Using a character like | would run into issues if the csv has | in one of the fields.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.