csv2md format output using awk

Question

I need to create a markdown formatted output using awk which is similar to table results that we get using mysql. In other words, I'm trying to mimic https://www.convertcsv.com/csv-to-markdown.htm using the below input.

id|type|cost|date|ship
0|A|223|201603|PORT
0|A|22|201602|PORT
0|A|422|201601|DOCK
1|B|3213|201602|DOCK
1|B|3213|201601|PORT
2|C|2321|201601|DOCK

The output I'm looking for is below

------------------------------
|id |type |cost |  date |ship |
------------------------------
|0  |A    |223  |201603 |PORT |
|0  |A    |22   |201602 |PORT |
|0  |A    |422  |201601 |DOCK |
|1  |B    |3213 |201602 |DOCK |
|1  |B    |3213 |201601 |PORT |
|2  |C    |2321 |201601 |DOCK |
------------------------------

My first attempt is to get the max size for each column and use that in the formatting when printing. But the below one is not working as expected.

awk -F"|" ' 
NR==1 { hdr=$0; for(i=1;i<=NF;i++) { a[i]=length($i) } next } 
{ for(i=1;i<=NF;i++) { ;a[i]=length($i)>a[i]?length($i):a[i] } content[NR]=$0 } 
END { 
for(i in a) len+=a[i]+2;
$0=""; OFS="-"; len++; NF=len; print ;
n=split(hdr,arr,FS); 
for(i=1;i<=n;i++) 
  {  printf("%6s |",arr[i]); }  # instead of 6 i want to pass a[i] ==> "%6" a[i] "s |" is not working
print "";

} 
' data.txt

How to fix it and get the required output.

Ed Morton · Accepted Answer · 2021-08-11 17:30:05Z

3

This requires a 2-pass approach, either of the following would work:

A) reading the input file twice so it uses very little memory:

$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==FNR {
    for (i=1; i<=NF; i++) {
        wid = length($i)
        wids[i] = (wid > wids[i] ? wid : wids[i])
    }
    next
}
FNR==1 {
    totWid = NF+1
    for (i=1; i<=NF; i++) {
        totWid += wids[i]
    }
    dashes = sprintf("%*s",totWid,"")
    gsub(/ /,"-",dashes)
    print dashes
    printf "%s", OFS
    for (i=1; i<=NF; i++) {
        printf "%*s%s", wids[i], $i, OFS
    }
    print ""
    print dashes
    next
}
{
    printf "%s", OFS
    for (i=1; i<=NF; i++) {
        printf "%-*s%s", wids[i], $i, OFS
    }
    print ""
}
END { print dashes }

$ awk -f tst.awk file file
--------------------------
|id|type|cost|  date|ship|
--------------------------
|0 |A   |223 |201603|PORT|
|0 |A   |22  |201602|PORT|
|0 |A   |422 |201601|DOCK|
|1 |B   |3213|201602|DOCK|
|1 |B   |3213|201601|PORT|
|2 |C   |2321|201601|DOCK|
--------------------------

B) storing the whole file in memory and then the 2nd pass is an array traversal rather than reading the file again which should run faster if you have enough memory to do it:

$ cat tst.awk
BEGIN { FS=OFS="|" }
{
    for (i=1; i<=NF; i++) {
        wid = length($i)
        wids[i] = (wid > wids[i] ? wid : wids[i])
        vals[NR,i] = $i
    }
}
END {
    totWid = NF+1
    for (i=1; i<=NF; i++) {
        totWid += wids[i]
    }
    dashes = sprintf("%*s",totWid,"")
    gsub(/ /,"-",dashes)
    print dashes
    printf "%s", OFS
    for (i=1; i<=NF; i++) {
        printf "%*s%s", wids[i], vals[1,i], OFS
    }
    print ""
    print dashes

    for (lineNr=2; lineNr<=NR; lineNr++) {
        printf "%s", OFS
        for (i=1; i<=NF; i++) {
            printf "%-*s%s", wids[i], vals[lineNr,i], OFS
        }
        print ""
    }
    print dashes
}

$ awk -f tst.awk file
--------------------------
|id|type|cost|  date|ship|
--------------------------
|0 |A   |223 |201603|PORT|
|0 |A   |22  |201602|PORT|
|0 |A   |422 |201601|DOCK|
|1 |B   |3213|201602|DOCK|
|1 |B   |3213|201601|PORT|
|2 |C   |2321|201601|DOCK|
--------------------------

edited Aug 11, 2021 at 17:30

answered Aug 11, 2021 at 17:21

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

stack0114106 Over a year ago

thanks Ed, it works.. in the code printf "%-*s%s", wids[i], vals[lineNr,i], OFS why one % is missing

Ed Morton Over a year ago

Note the * in %-*s - that's where you specify the field width as one of the args, in this case wids[i]. printf "%*s\n", 10, "foo" is functionally equivalent to printf "%"10"s\n", "foo". See "dynamic width and prec capability" at gnu.org/software/gawk/manual/gawk.html#Format-Modifiers.

Pierre François · Accepted Answer · 2021-08-12 14:44:05Z

1

There is a program for making tabular output, which is columns -s '|' -t, but it looses the separator | in its output.

You can recover that separator with sed, replacing each white space followed by a non white char by a | followed by that non white char.

Finally, awk helps you to add a line before and after the headers of the columns.

column -s '|' -t input.csv | sed -E 's/ ([^ ])/|\1/g' | awk '(NR == 1) {l = $0; gsub(/./, "-", l); print l "\n" $0 "\n" l} (NR > 1) {print} END{print l}'

edited Aug 12, 2021 at 14:44

answered Aug 12, 2021 at 14:36

Pierre François

6,1681 gold badge21 silver badges42 bronze badges

2 Comments

stack0114106 Over a year ago

good approximation with a one-liner, acceptable to me.. btw column command has some limitations wrt blanks.. right?....i.e not parsing well..

Pierre François Over a year ago

@stack0114106: columns has no problem when the separator is |. The problem can rise if the fields contain blanks at the level of the sed command.

stack0114106 · Accepted Answer · 2021-08-12 14:05:10Z

Here is my answer using the concepts from Ed.

awk -F"|" ' 
NR==1 { hdr=$0; for(i=1;i<=NF;i++) { a[i]=length($i) } next } 
{ 
  for(i=1;i<=NF;i++) a[i]=length($i)>a[i]?length($i):a[i] ;
  content[NR]=$0 
} 
END { 
dashes="+";
  for(i in a) 
  {      len+=a[i]+1; 
         dash=sprintf("%*s",a[i],""); 
         gsub(/ /,"-",dash); 
         dashes=dashes dash "+" 
  }
 #Header
    print dashes;
    n=split(hdr,arr,FS); printf("|");
    for(i=1;i<=n;i++) { fmt="%" a[i] "s|" ; printf(fmt,arr[i]); } print "";
    print dashes;

 #Contents  
    for(i=2;i<=NR;i++) 
    {
     n=split(content[i],arr,FS); printf("|");
     for(j=1;j<=n;j++) { fmt="%-" a[j] "s|" ; printf(fmt,arr[j]); } print "";
    }
    print dashes
} 
' data.txt

Collectives™ on Stack Overflow

csv2md format output using awk

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related