2

I have file with lines like so:

1       17      A       G       R:560:500:60:10.71%:1.6329E-19  Pass:1.0:276:0:57:0:1E0 15      17      0       0       R:24:20:4:16.67%:5.461E-2 R:22:20:2:9.09%:2.4419E-1 R:27:24:3:11.11%:1.1792E-1 R:26:23:3:11.54%:1.1765E-1 A:16:16:0:0%:1E0 A:23:23:0:0%:1E0 A:11:10:1:9.09%:5E-1
1       36      C       T       Y:560:499:61:10.89%:7.7026E-20  Pass:1.0:275:0:58:0:1E0 15      17      0       0       Y:24:20:4:16.67%:5.461E-2 Y:22:20:2:9.09%:2.4419E-1 Y:27:24:3:11.11%:1.1792E-1 Y:26:23:3:11.54%:1.1765E-1 C:16:16:0:0%:1E0 C:23:23:0:0%:1E0 C:11:10:1:9.09%:5E-1 

I have been previously using the following awk on liner to extract the first character of each field from $11 onwards.

awk '{n=11; while (n<18) {{$n = substr($n, 0, 1)} n++} print $0}'

I am looking for an easy way to modify it so I can extract only the percentages from these fileds (the value after the 4th colon of the field). The output would look like this:

1       17      A       G       R:560:500:60:10.71%:1.6329E-19  Pass:1.0:276:0:57:0:1E0 15      17      0       0       16.67% 9.09% 11.11% 11.54% 0% 0% 9.09%
1       36      C       T       Y:560:499:61:10.89%:7.7026E-20  Pass:1.0:275:0:58:0:1E0 15      17      0       0       16.67% 9.09% 11.11% 11.54% 0% 0% 9.09%   

Cheers.

1 Answer 1

2

This will print the percentage including the "%":

split($5, arr, ":"); print arr[5]

Adjust the field number in the split() statement to suit your data.

You don't need to use a while loop and manage the increment variable yourself, just use a for loop. Here is a complete, working script using the technique shown above and a for loop:

awk 'BEGIN {OFS = "\t"} {for (n = 11; n < 18; n++) {split($n, arr, ":"); $n = arr[5]}; print $0}'

Sample output:

1   17  A   G   R:560:500:60:10.71%:1.6329E-19  Pass:1.0:276:0:57:0:1E0 15  17  0   0   16.67%  9.09%   11.11%  11.54%  0%  0%  9.09%
1   36  C   T   Y:560:499:61:10.89%:7.7026E-20  Pass:1.0:275:0:58:0:1E0 15  17  0   0   16.67%  9.09%   11.11%  11.54%  0%  0%  9.09%
Sign up to request clarification or add additional context in comments.

3 Comments

awk '{n=11; while (n<18) {{$n = substr($n, 0, 1)} n++} print $0}'
I am having trouble incorporating it in to the awk one liner. awk '{n=11; while (n<18) {{$n = split($n, arr,":" )} n++} print $0}' is giving me just the number of elements in each array.
@user1308144: Please see my edited answer. split() puts the results in the named array and returns the number of parts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.