How to split and replace strings in columns using awk

Question

I have a tab-delim text file with only 4 columns as shown below:

GT:CN:CNL:CNP:CNQ:FT    .:2:a:b:c:PASS    .:2:c:b:a:PASS    .:2:d:c:a:FAIL

If the string "FAIL" is found in a specific column starting from column2 to columnN (all the strings are separated by ":") then it would need to replace the second element in that column to "-1". Sample output is shown below:

GT:CN:CNL:CNP:CNQ:FT    .:2:a:b:c:PASS    .:2:c:b:a:PASS    .:-1:d:c:a:FAIL

Any help using awk?

Is the string FAIL always in the last ":" delimited part of the columns? — Lars Fischer
– Lars Fischer, Commented May 13, 2016 at 12:55

Ed Morton · Accepted Answer · 2016-05-13 13:12:41Z

2

With any awk:

$ awk 'BEGIN{FS=OFS="\t"} {for (i=2;i<=NF;i++) if ($i~/:FAIL$/) sub(/:[^:]+/,":-1",$i)} 1' file
GT:CN:CNL:CNP:CNQ:FT    .:2:a:b:c:PASS  .:2:c:b:a:PASS  .:-1:d:c:a:FAIL

answered May 13, 2016 at 13:12

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sir. Hedgehog · Accepted Answer · 2016-05-13 12:52:08Z

2

In order to split in awk you can use "split".

An example of it would be the following:

split(1,2,"3");

1 is the string you want to split
2 is the array you want to split it into
and 3 is the character that you want to be split on

e.g

string="hello:world"
result=`echo $string | awk '{ split($1,ARR,":"); printf("%s ",ARR[1]);}'`

In this case the result would be equal to hello, because we split the string to the " : " character and we printed the first half of the ARR, if we would print the second half (so printf("%s ",ARR[2])) of the ARR then it would be returned to result the "world".

answered May 13, 2016 at 12:52

Sir. Hedgehog

1,3103 gold badges18 silver badges42 bronze badges

Comments

jijinp · Accepted Answer · 2016-05-13 13:12:04Z

2

With gawk:

awk '{$0=gensub(/[^:]*(:[^:]*:[^:]*:[^:]:FAIL)/,"-1\\1", "g" , $0)};1' File

with sed:

sed 's/[^:]*\(:[^:]*:[^:]*:[^:]:FAIL\)/-1\1/g' File

edited May 13, 2016 at 13:12

answered May 13, 2016 at 12:46

jijinp

2,6821 gold badge15 silver badges15 bronze badges

Comments

Thor · Accepted Answer · 2016-05-13 13:23:18Z

1

If you are using GNU awk, you can take advantage of the RT feature¹ and split the records at tabs and newlines:

awk '$NF == "FAIL" { $2 = "-1"; } { printf "%s", $0 RT }' RS='[\t\n]' FS=':' infile

Output:

GT:CN:CNL:CNP:CNQ:FT    .:2:a:b:c:PASS  .:2:c:b:a:PASS  .:-1:d:c:a:FAIL

¹ _{The record separator that follows the current record.}

edited May 13, 2016 at 13:23

answered May 13, 2016 at 12:51

Thor

47.7k12 gold badges125 silver badges140 bronze badges

Comments

William Pursell · Accepted Answer · 2016-05-13 12:58:18Z

0

Your requirements are somewhat vague, but I'm pretty sure this does what you want with bog standard awk (no gnu-awk extensions):

awk '/FAIL/{$2=-1}1' ORS=\\t RS=\\t FS=: OFS=: input

answered May 13, 2016 at 12:58

William Pursell

214k49 gold badges279 silver badges317 bronze badges

1 Comment

William Pursell Over a year ago

It prints an extra tab at the end of the file, which you may want to trim in post processing (just pipe the output to sed '$d'). Also, I'm taking some liberties; if column 1 matches 'FAIL' then this will modify the last column of the previous line, but I'm assuming that column 1 is always a fixed header.

Collectives™ on Stack Overflow

How to split and replace strings in columns using awk

5 Answers 5

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related