2

How can I find and replace value for particular column using awk?

Say for example -> I have a file test having the content:

"abc":"100"::"new"

"xyz":"200":"mob":"old"

"lmn":"300"::"new"

"pqr":"400":"mob2":"new"

Now, if 3rd column is blank then I want to replace the blank value from "N/A" otherwise print the line as it is, so that the output would be like:

"abc":"100":"N/A":"new"

"xyz":"200":"mob":"old"

"lmn":"300":"N/A":"new"

"pqr":"400":"mob2":"new"

Although I got the output using awk through below command:

awk -F":" '{
    if ( $3 == "")
        print $1":"$2":\"N\/A\":"$4
    else
        print $0
}' test

But here I am using the hard coded values for each column like $1, $2, so if the blank column changes in other example from 3rd to xyz then have to change the same in command again. Is there any other way to get the same output using awk and without using hard coded values for columns? Thanks for your help.

3
  • You mean replacing any empty column? Or only empty values of an arbitrary column, and only on that column? Commented Jul 10, 2018 at 6:14
  • the replace should work only for those rows in which value of 3rd column is blank... Commented Jul 10, 2018 at 6:16
  • What does if the blank column changes in other example from 3rd to xyz then have to change the same in command again mean then? It sounds like you're saying with that sentence that you want to change any blank column, but then you also say the replace should work only for those rows in which value of 3rd column is blank which means only the 3rd column should be tested/changed. It's very unclear.... Commented Jul 10, 2018 at 14:12

4 Answers 4

3

First, let's simplifiy your present program a bit:

awk -F: 'BEGIN {OFS=FS} {       
  if ( $3 == "") $3="N/A"
  print $0
}' test

Now we can make two things variable: The column to test, and the replacement string. Hence, the body of the program will look something like

if ( $fieldnumber == "" ) $fieldnumber=replacement

What remains to be done, is to fill in the variables. If you look at the man page of awk, you see that the option -v allows us to specify the initial value for an awk variable.

awk -F: -v fieldnumber=... -v replacement=...

This allows you to fill this variable from wherever you like - parameter of your shell script, environment variable etc.

UPDATE: Fix output field separator (OFS) UPDATE: Fix syntax error

Sign up to request clarification or add additional context in comments.

5 Comments

We need to add OFS=":" long with FS=":" for your code to work. Or else, line where replacement happens, OFS is replaced to [space] instead of ":"
That will fail when there are colons within the quoted fields.
@EdMorton : This is correct, but this problem is already present in the specialized solution which the OP claimed would work for her. That's why I didn't discuss this point. Technically, we would need to know whether this is really an issue (maybe the quotes don't have the usual meaning as delimiter, and the only delimiter present is the colon), or, much more likely, the file to be processed is in CSV format, in which case a solution (in, say, Ruby or Perl) using a CSV parser would be appropriate.
failed with the error: awk: cmd. line:1: BEGIN {OFS=FS} awk: cmd. line:1: ^ syntax error
@EdMorton : Ah, you are right, as usual. My mistake. Fixed my posting.
2

The right way to do this, using GNU awk for FPAT and a modified input file to demonstrate that it works even when colons are present within quoted fields:

$ cat tst.awk
BEGIN {
    FPAT = "([^:]*)|(\"[^\"]+\")"
    OFS = ":"
}
$3 == "" { $3 = "\"N/A\"" }
{ print }

$ cat file
"abc:def":"100"::"new"
"xyz":"200":"mob":"old"
"lmn":"123:456:300"::"new"
"pqr":"400":"mob2":"new"
"stu":"600":"foo::bar":"more"

$ awk -f tst.awk file
"abc:def":"100":"N/A":"new"
"xyz":"200":"mob":"old"
"lmn":"123:456:300":"N/A":"new"
"pqr":"400":"mob2":"new"
"stu":"600":"foo::bar":"more"

Comments

0

Using GNU awk:

awk -v RS='[:\n]'  '!NF{$0="\"N/A\""}{printf "%s%s",$0,RT}' test

The record separator RS is set to capture the data between the semi-colon :.

If there isn't any field (!NF), then set the want string.

The printf statement write the data and the record separator RT for the current record.

3 Comments

That will fail when there are colons within the quoted fields.
@EdMorton No, it doesn't fail, neither with OP's example and yours. Even if there are colons in the field, the test based on NF will not be done, so the record will remain untouched and printed as is. The only valid remark could be that the records are not correctly splited, but that's irrelevant in OP's case.
The sample input I posted wasn't intended to be exhaustive. Try your script when one of the fields is "foo::bar" and you'll see that field become "foo:"N/A":bar". I'll add that case to the sample input in my answer.
0

How about below piece of gawk code:

BEGIN {
    FS=":"
    OFS=":"
}
{
    for(i=1; i<=4; i++) {
        if ($(i) == "") field[i] = "N/A"
        else field[i] = $(i)
    }
    if ($0 != "") print field[1],field[2],field[3],field[4]
}

-- Or --
Maybe below piece of bash script is much simpler:

#!/bin/bash
export IFS=":"
while read a b c d; do
    echo "${a:-N/A}:${b:-N/A}:${c:-N/A}:${d:-N/A}"
done

with input redirection, i.e. this_bash_script.sh < your_test_input.txt

1 Comment

Those scripts will fail when there are colons within the quoted fields. Also, the bash loop has bugs that will corrupt the output given some input and will be orders of magnitude slower than an equivalent awk script - see why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the issues.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.