replace particular column value using awk if found

Question

How can I find and replace value for particular column using awk?

Say for example -> I have a file test having the content:

"abc":"100"::"new"

"xyz":"200":"mob":"old"

"lmn":"300"::"new"

"pqr":"400":"mob2":"new"

Now, if 3rd column is blank then I want to replace the blank value from "N/A" otherwise print the line as it is, so that the output would be like:

"abc":"100":"N/A":"new"

"xyz":"200":"mob":"old"

"lmn":"300":"N/A":"new"

"pqr":"400":"mob2":"new"

Although I got the output using awk through below command:

awk -F":" '{
    if ( $3 == "")
        print $1":"$2":\"N\/A\":"$4
    else
        print $0
}' test

But here I am using the hard coded values for each column like $1, $2, so if the blank column changes in other example from 3rd to xyz then have to change the same in command again. Is there any other way to get the same output using awk and without using hard coded values for columns? Thanks for your help.

You mean replacing any empty column? Or only empty values of an arbitrary column, and only on that column? — Poshi
– Poshi, Commented Jul 10, 2018 at 6:14
the replace should work only for those rows in which value of 3rd column is blank... — Sandy
– Sandy, Commented Jul 10, 2018 at 6:16
What does if the blank column changes in other example from 3rd to xyz then have to change the same in command again mean then? It sounds like you're saying with that sentence that you want to change any blank column, but then you also say the replace should work only for those rows in which value of 3rd column is blank which means only the 3rd column should be tested/changed. It's very unclear.... — Ed Morton
– Ed Morton, Commented Jul 10, 2018 at 14:12

user1934428 · Accepted Answer · 2018-07-11 12:19:59Z

3

First, let's simplifiy your present program a bit:

awk -F: 'BEGIN {OFS=FS} {       
  if ( $3 == "") $3="N/A"
  print $0
}' test

Now we can make two things variable: The column to test, and the replacement string. Hence, the body of the program will look something like

if ( $fieldnumber == "" ) $fieldnumber=replacement

What remains to be done, is to fill in the variables. If you look at the man page of awk, you see that the option -v allows us to specify the initial value for an awk variable.

awk -F: -v fieldnumber=... -v replacement=...

This allows you to fill this variable from wherever you like - parameter of your shell script, environment variable etc.

UPDATE: Fix output field separator (OFS) UPDATE: Fix syntax error

edited Jul 11, 2018 at 12:19

answered Jul 10, 2018 at 6:24

user1934428

22.8k9 gold badges57 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ashishkumar Singh Over a year ago

We need to add OFS=":" long with FS=":" for your code to work. Or else, line where replacement happens, OFS is replaced to [space] instead of ":"

Ed Morton Over a year ago

That will fail when there are colons within the quoted fields.

user1934428 Over a year ago

@EdMorton : This is correct, but this problem is already present in the specialized solution which the OP claimed would work for her. That's why I didn't discuss this point. Technically, we would need to know whether this is really an issue (maybe the quotes don't have the usual meaning as delimiter, and the only delimiter present is the colon), or, much more likely, the file to be processed is in CSV format, in which case a solution (in, say, Ruby or Perl) using a CSV parser would be appropriate.

Sandy Over a year ago

failed with the error: awk: cmd. line:1: BEGIN {OFS=FS} awk: cmd. line:1: ^ syntax error

user1934428 Over a year ago

@EdMorton : Ah, you are right, as usual. My mistake. Fixed my posting.

Ed Morton · Accepted Answer · 2018-07-10 15:29:46Z

2

The right way to do this, using GNU awk for FPAT and a modified input file to demonstrate that it works even when colons are present within quoted fields:

$ cat tst.awk
BEGIN {
    FPAT = "([^:]*)|(\"[^\"]+\")"
    OFS = ":"
}
$3 == "" { $3 = "\"N/A\"" }
{ print }

$ cat file
"abc:def":"100"::"new"
"xyz":"200":"mob":"old"
"lmn":"123:456:300"::"new"
"pqr":"400":"mob2":"new"
"stu":"600":"foo::bar":"more"

$ awk -f tst.awk file
"abc:def":"100":"N/A":"new"
"xyz":"200":"mob":"old"
"lmn":"123:456:300":"N/A":"new"
"pqr":"400":"mob2":"new"
"stu":"600":"foo::bar":"more"

edited Jul 10, 2018 at 15:29

answered Jul 10, 2018 at 14:10

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Comments

oliv · Accepted Answer · 2018-07-10 06:30:55Z

0

Using GNU awk:

awk -v RS='[:\n]'  '!NF{$0="\"N/A\""}{printf "%s%s",$0,RT}' test

The record separator RS is set to capture the data between the semi-colon :.

If there isn't any field (!NF), then set the want string.

The printf statement write the data and the record separator RT for the current record.

answered Jul 10, 2018 at 6:30

oliv

13.3k30 silver badges52 bronze badges

3 Comments

Ed Morton Over a year ago

That will fail when there are colons within the quoted fields.

oliv Over a year ago

@EdMorton No, it doesn't fail, neither with OP's example and yours. Even if there are colons in the field, the test based on NF will not be done, so the record will remain untouched and printed as is. The only valid remark could be that the records are not correctly splited, but that's irrelevant in OP's case.

Ed Morton Over a year ago

The sample input I posted wasn't intended to be exhaustive. Try your script when one of the fields is "foo::bar" and you'll see that field become "foo:"N/A":bar". I'll add that case to the sample input in my answer.

Yingyu YOU · Accepted Answer · 2018-07-10 06:58:13Z

0

How about below piece of gawk code:

BEGIN {
    FS=":"
    OFS=":"
}
{
    for(i=1; i<=4; i++) {
        if ($(i) == "") field[i] = "N/A"
        else field[i] = $(i)
    }
    if ($0 != "") print field[1],field[2],field[3],field[4]
}

-- Or --
Maybe below piece of bash script is much simpler:

#!/bin/bash
export IFS=":"
while read a b c d; do
    echo "${a:-N/A}:${b:-N/A}:${c:-N/A}:${d:-N/A}"
done

with input redirection, i.e. this_bash_script.sh < your_test_input.txt

edited Jul 10, 2018 at 6:58

answered Jul 10, 2018 at 6:47

Yingyu YOU

3691 silver badge5 bronze badges

1 Comment

Ed Morton Over a year ago

Those scripts will fail when there are colons within the quoted fields. Also, the bash loop has bugs that will corrupt the output given some input and will be orders of magnitude slower than an equivalent awk script - see why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the issues.

Collectives™ on Stack Overflow

replace particular column value using awk if found

4 Answers 4

5 Comments

Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related