0

I have copied 3 columns data into a file separated by comma(,) and doing some string manipulation but dont know why empty '' is being added for every line-

My Input file-

"['3.FIT-DYN', '3.MYFIT-LTR-DYN']",FIT-L-PHY-PRM-GI,2014-07-11 14:07:28+0000
 ['1.839324'],4.FIDCWRRTL,2015-04-16 12:04:21+0000
              ,4.AIQM,2015-04-16 12:04:21+0000

If you see only 3rd line has an empty data at first place.

my awk-

     BEGIN { FS=",?\"?[][]\"?,?"; OFS="," }
     {
       if (split($2,a,/\047/)) {
          for (j=2; j in a; j+=2) {
           $2 = a[j]
           prt()
      }
     }
     else {
    prt()
      }
     }


    function prt(   out) {
      out = "\047" $0 "\047"
      gsub(OFS,"\047,\047",out)
      print out
     }

o/p-

  '','3.FIT-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
  '','3.MYFIT-LTR-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
  '','1.839324','4.FIDCWRRTL','2015-04-16 12:04:21+0000'
  '','4.AIQM','2015-04-16 12:04:21+0000'

expected o/p-

    '3.FIT-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
    '3.MYFIT-LTR-DYN','FIT-L-PHY-PRM-GI','2014-07-11 14:07:28+0000'
    '1.839324','4.FIDCWRRTL','2015-04-16 12:04:21+0000'
    '','4.AIQM','2015-04-16 12:04:21+0000'

1 Answer 1

1

You are using square brackets as field separators. In your data, the first character is a square bracket, i.e. a field separator. This means that your first field is everything that precedes the first field separator, in this case, the empty string.

I suggest something else:

BEGIN { FS="]\"?,"; OFS="," }
NF == 1 { sub("^[[:blank:]]*", "''"); print; next }
{
    sub(" *\"?\\[", "", $1);
    count = split($1, fields, ", *");
    for (i = 1; i <= count; i++) {
        $1 = fields[i]
        print
    }
}
Sign up to request clarification or add additional context in comments.

1 Comment

The first arg for sub() and the 3rg arg for split() are regexps, not strings, so you should be using regexp delimiters /.../ instead of string delimiters "...". The former define regexp constants which are taken as-is while the latter define dynamic regexps which have to be parsed twice, once to convert from a string to a regexp and then again when used as a regexp, and so have some additional complication including requiring doubling up of escapes. Use sub(/regexp/,string) and split(string,array,/regexp/) unless necessary, eg to create a regexp via concatenation: sub("foo"$1,"bar").

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.