3

I need to create a markdown formatted output using awk which is similar to table results that we get using mysql. In other words, I'm trying to mimic https://www.convertcsv.com/csv-to-markdown.htm using the below input.

id|type|cost|date|ship
0|A|223|201603|PORT
0|A|22|201602|PORT
0|A|422|201601|DOCK
1|B|3213|201602|DOCK
1|B|3213|201601|PORT
2|C|2321|201601|DOCK

The output I'm looking for is below

------------------------------
|id |type |cost |  date |ship |
------------------------------
|0  |A    |223  |201603 |PORT |
|0  |A    |22   |201602 |PORT |
|0  |A    |422  |201601 |DOCK |
|1  |B    |3213 |201602 |DOCK |
|1  |B    |3213 |201601 |PORT |
|2  |C    |2321 |201601 |DOCK |
------------------------------

My first attempt is to get the max size for each column and use that in the formatting when printing. But the below one is not working as expected.

awk -F"|" ' 
NR==1 { hdr=$0; for(i=1;i<=NF;i++) { a[i]=length($i) } next } 
{ for(i=1;i<=NF;i++) { ;a[i]=length($i)>a[i]?length($i):a[i] } content[NR]=$0 } 
END { 
for(i in a) len+=a[i]+2;
$0=""; OFS="-"; len++; NF=len; print ;
n=split(hdr,arr,FS); 
for(i=1;i<=n;i++) 
  {  printf("%6s |",arr[i]); }  # instead of 6 i want to pass a[i] ==> "%6" a[i] "s |" is not working
print "";

} 
' data.txt

How to fix it and get the required output.

3 Answers 3

3

This requires a 2-pass approach, either of the following would work:

A) reading the input file twice so it uses very little memory:

$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==FNR {
    for (i=1; i<=NF; i++) {
        wid = length($i)
        wids[i] = (wid > wids[i] ? wid : wids[i])
    }
    next
}
FNR==1 {
    totWid = NF+1
    for (i=1; i<=NF; i++) {
        totWid += wids[i]
    }
    dashes = sprintf("%*s",totWid,"")
    gsub(/ /,"-",dashes)
    print dashes
    printf "%s", OFS
    for (i=1; i<=NF; i++) {
        printf "%*s%s", wids[i], $i, OFS
    }
    print ""
    print dashes
    next
}
{
    printf "%s", OFS
    for (i=1; i<=NF; i++) {
        printf "%-*s%s", wids[i], $i, OFS
    }
    print ""
}
END { print dashes }

$ awk -f tst.awk file file
--------------------------
|id|type|cost|  date|ship|
--------------------------
|0 |A   |223 |201603|PORT|
|0 |A   |22  |201602|PORT|
|0 |A   |422 |201601|DOCK|
|1 |B   |3213|201602|DOCK|
|1 |B   |3213|201601|PORT|
|2 |C   |2321|201601|DOCK|
--------------------------

B) storing the whole file in memory and then the 2nd pass is an array traversal rather than reading the file again which should run faster if you have enough memory to do it:

$ cat tst.awk
BEGIN { FS=OFS="|" }
{
    for (i=1; i<=NF; i++) {
        wid = length($i)
        wids[i] = (wid > wids[i] ? wid : wids[i])
        vals[NR,i] = $i
    }
}
END {
    totWid = NF+1
    for (i=1; i<=NF; i++) {
        totWid += wids[i]
    }
    dashes = sprintf("%*s",totWid,"")
    gsub(/ /,"-",dashes)
    print dashes
    printf "%s", OFS
    for (i=1; i<=NF; i++) {
        printf "%*s%s", wids[i], vals[1,i], OFS
    }
    print ""
    print dashes

    for (lineNr=2; lineNr<=NR; lineNr++) {
        printf "%s", OFS
        for (i=1; i<=NF; i++) {
            printf "%-*s%s", wids[i], vals[lineNr,i], OFS
        }
        print ""
    }
    print dashes
}

$ awk -f tst.awk file
--------------------------
|id|type|cost|  date|ship|
--------------------------
|0 |A   |223 |201603|PORT|
|0 |A   |22  |201602|PORT|
|0 |A   |422 |201601|DOCK|
|1 |B   |3213|201602|DOCK|
|1 |B   |3213|201601|PORT|
|2 |C   |2321|201601|DOCK|
--------------------------
Sign up to request clarification or add additional context in comments.

2 Comments

thanks Ed, it works.. in the code printf "%-*s%s", wids[i], vals[lineNr,i], OFS why one % is missing
Note the * in %-*s - that's where you specify the field width as one of the args, in this case wids[i]. printf "%*s\n", 10, "foo" is functionally equivalent to printf "%"10"s\n", "foo". See "dynamic width and prec capability" at gnu.org/software/gawk/manual/gawk.html#Format-Modifiers.
1

There is a program for making tabular output, which is columns -s '|' -t, but it looses the separator | in its output.

You can recover that separator with sed, replacing each white space followed by a non white char by a | followed by that non white char.

Finally, awk helps you to add a line before and after the headers of the columns.

column -s '|' -t input.csv | sed -E 's/ ([^ ])/|\1/g' | awk '(NR == 1) {l = $0; gsub(/./, "-", l); print l "\n" $0 "\n" l} (NR > 1) {print} END{print l}'

2 Comments

good approximation with a one-liner, acceptable to me.. btw column command has some limitations wrt blanks.. right?....i.e not parsing well..
@stack0114106: columns has no problem when the separator is |. The problem can rise if the fields contain blanks at the level of the sed command.
0

Here is my answer using the concepts from Ed.

awk -F"|" ' 
NR==1 { hdr=$0; for(i=1;i<=NF;i++) { a[i]=length($i) } next } 
{ 
  for(i=1;i<=NF;i++) a[i]=length($i)>a[i]?length($i):a[i] ;
  content[NR]=$0 
} 
END { 
dashes="+";
  for(i in a) 
  {      len+=a[i]+1; 
         dash=sprintf("%*s",a[i],""); 
         gsub(/ /,"-",dash); 
         dashes=dashes dash "+" 
  }
 #Header
    print dashes;
    n=split(hdr,arr,FS); printf("|");
    for(i=1;i<=n;i++) { fmt="%" a[i] "s|" ; printf(fmt,arr[i]); } print "";
    print dashes;

 #Contents  
    for(i=2;i<=NR;i++) 
    {
     n=split(content[i],arr,FS); printf("|");
     for(j=1;j<=n;j++) { fmt="%-" a[j] "s|" ; printf(fmt,arr[j]); } print "";
    }
    print dashes
} 
' data.txt

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.