Parsing data using awk

Question

I have copied 3 columns data into a file separated by comma(,) and doing some string manipulation but dont know why empty '' is being added for every line-

My Input file-

"['3.FIT-DYN', '3.MYFIT-LTR-DYN']",FIT-L-PHY-PRM-GI,2014-07-11 14:07:28+0000
 ['1.839324'],4.FIDCWRRTL,2015-04-16 12:04:21+0000
              ,4.AIQM,2015-04-16 12:04:21+0000

If you see only 3rd line has an empty data at first place.

my awk-

     BEGIN { FS=",?\"?[][]\"?,?"; OFS="," }
     {
       if (split($2,a,/\047/)) {
          for (j=2; j in a; j+=2) {
           $2 = a[j]
           prt()
      }
     }
     else {
    prt()
      }
     }


    function prt(   out) {
      out = "\047" $0 "\047"
      gsub(OFS,"\047,\047",out)
      print out
     }

o/p-

  '','3.FIT-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
  '','3.MYFIT-LTR-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
  '','1.839324','4.FIDCWRRTL','2015-04-16 12:04:21+0000'
  '','4.AIQM','2015-04-16 12:04:21+0000'

expected o/p-

    '3.FIT-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
    '3.MYFIT-LTR-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
    '1.839324','4.FIDCWRRTL','2015-04-16 12:04:21+0000'
    '','4.AIQM','2015-04-16 12:04:21+0000'

Michael Vehrs · Accepted Answer · 2016-05-19 09:38:41Z

1

You are using square brackets as field separators. In your data, the first character is a square bracket, i.e. a field separator. This means that your first field is everything that precedes the first field separator, in this case, the empty string.

I suggest something else:

BEGIN { FS="]\"?,"; OFS="," }
NF == 1 { sub("^[[:blank:]]*", "''"); print; next }
{
    sub(" *\"?\\[", "", $1);
    count = split($1, fields, ", *");
    for (i = 1; i <= count; i++) {
        $1 = fields[i]
        print
    }
}

edited May 19, 2016 at 9:38

answered May 19, 2016 at 9:09

Michael Vehrs

3,39314 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ed Morton Over a year ago

The first arg for sub() and the 3rg arg for split() are regexps, not strings, so you should be using regexp delimiters /.../ instead of string delimiters "...". The former define regexp constants which are taken as-is while the latter define dynamic regexps which have to be parsed twice, once to convert from a string to a regexp and then again when used as a regexp, and so have some additional complication including requiring doubling up of escapes. Use sub(/regexp/,string) and split(string,array,/regexp/) unless necessary, eg to create a regexp via concatenation: sub("foo"$1,"bar").

Collectives™ on Stack Overflow

Parsing data using awk

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related