0

I have a tab-delim text file with only 4 columns as shown below:

GT:CN:CNL:CNP:CNQ:FT    .:2:a:b:c:PASS    .:2:c:b:a:PASS    .:2:d:c:a:FAIL

If the string "FAIL" is found in a specific column starting from column2 to columnN (all the strings are separated by ":") then it would need to replace the second element in that column to "-1". Sample output is shown below:

GT:CN:CNL:CNP:CNQ:FT    .:2:a:b:c:PASS    .:2:c:b:a:PASS    .:-1:d:c:a:FAIL

Any help using awk?

1
  • 1
    Is the string FAIL always in the last ":" delimited part of the columns? Commented May 13, 2016 at 12:55

5 Answers 5

2

With any awk:

$ awk 'BEGIN{FS=OFS="\t"} {for (i=2;i<=NF;i++) if ($i~/:FAIL$/) sub(/:[^:]+/,":-1",$i)} 1' file
GT:CN:CNL:CNP:CNQ:FT    .:2:a:b:c:PASS  .:2:c:b:a:PASS  .:-1:d:c:a:FAIL
Sign up to request clarification or add additional context in comments.

Comments

2

In order to split in awk you can use "split".

An example of it would be the following:

split(1,2,"3");
  1. 1 is the string you want to split
  2. 2 is the array you want to split it into
  3. and 3 is the character that you want to be split on

e.g

string="hello:world"
result=`echo $string | awk '{ split($1,ARR,":"); printf("%s ",ARR[1]);}'`

In this case the result would be equal to hello, because we split the string to the " : " character and we printed the first half of the ARR, if we would print the second half (so printf("%s ",ARR[2])) of the ARR then it would be returned to result the "world".

Comments

2

With gawk:

awk '{$0=gensub(/[^:]*(:[^:]*:[^:]*:[^:]:FAIL)/,"-1\\1", "g" , $0)};1' File

with sed:

sed 's/[^:]*\(:[^:]*:[^:]*:[^:]:FAIL\)/-1\1/g' File

Comments

1

If you are using GNU awk, you can take advantage of the RT feature1 and split the records at tabs and newlines:

awk '$NF == "FAIL" { $2 = "-1"; } { printf "%s", $0 RT }' RS='[\t\n]' FS=':' infile

Output:

GT:CN:CNL:CNP:CNQ:FT    .:2:a:b:c:PASS  .:2:c:b:a:PASS  .:-1:d:c:a:FAIL

1 The record separator that follows the current record.

Comments

0

Your requirements are somewhat vague, but I'm pretty sure this does what you want with bog standard awk (no gnu-awk extensions):

awk '/FAIL/{$2=-1}1' ORS=\\t RS=\\t FS=: OFS=: input

1 Comment

It prints an extra tab at the end of the file, which you may want to trim in post processing (just pipe the output to sed '$d'). Also, I'm taking some liberties; if column 1 matches 'FAIL' then this will modify the last column of the previous line, but I'm assuming that column 1 is always a fixed header.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.