0

My data looks like this,

1   20010101  945   A   6
1   20010101  946   B   4
1   20010101  947   P   3.5
1   20010101  950   A   5
1   20010101  951   P   4
1   20010101  952   P   4
1   20010101  1010  A   4
1   20010101  1011  P   4
2   20010101  940   A   3.5
2   20010101  1015  A   3
2   20010101  1113  B   3.5
2   20010101  1114  P   3.2
2   20010101  1115  B       3.4
2   20010101  1116  P   3.1
2   20010101  1119  P   3.6

I am trying to find all the lines (with P) followed by the latest A and B values based on the matching of first two columns (e.g., 1 and 20010101).

The result is expected to be like this,

1   20010101  947   P  3.5  6   4
1   20010101  951   P  4    5   4
1   20010101  952   P  4    5   4
1   20010101  1011  P  4    4   4
2   20010101  1114  P  3.2  3   3.5
2   20010101  1116  P  3.1  3   3.4
2   20010101  1119  P  3.6  3   3.4

Does it need to sort by using hash in Perl? I am lack of ideas could anybody give any hint? I will be much appreciated!

5
  • @TLP Thanks for your comment. I am quite new in Perl and what I know currently is to sum up all the values by matching the lines using hash. But in this case, to find variables around the specific variable is quite difficult for me...Any ideas rather than codes are welcomed. Commented May 30, 2013 at 20:59
  • So what data structure is the above data in? Do you have a bunch of loose variables or is it in a hash? if a hash then look into foreach loop. Commented May 30, 2013 at 21:01
  • You need to state your requirements much clearer. What is "latest A and B values"? Looks like your only requirement is that col 4 == "P". Commented May 30, 2013 at 21:02
  • @scrappedcola Thx, actually I have three variables in my dataset: P, A and B. Using hash is my idea so far. Commented May 30, 2013 at 21:16
  • @TLP Sorry about my unclear description. You can consider column 2 as time in a day, for each "P", I need to find A and B at the latest time before the time of each "P". Then display values of A and B after each P. Hope it is clear to understand now. Commented May 30, 2013 at 21:18

2 Answers 2

3
perl -ane 'if($F[3] eq "P"){ s/$/  $la  $lb/; print; }else{ ($la,$lb) = ($F[3] eq "A")?($F[4],$lb):($la,$F[4]) }' data.txt
Sign up to request clarification or add additional context in comments.

Comments

1

Simplest solved with a simple if-elsif structure:

use strict;
use warnings;

my ($A, $B);
while (<DATA>) {
    my @data = split;
    if ($data[3] eq "A") {
        $A = $data[4];
    } elsif ($data[3] eq "B") {
        $B = $data[4];
    } elsif ($data[3] eq "P") {
        print join("\t", @data, $A, $B), "\n";
    }
}


__DATA__
1   20010101  945   A   6
1   20010101  946   B   4
1   20010101  947   P   3.5
1   20010101  950   A   5
1   20010101  951   P   4
1   20010101  952   P   4
1   20010101  1010  A   4
1   20010101  1011  P   4
2   20010101  940   A   3.5
2   20010101  1015  A   3
2   20010101  1113  B   3.5
2   20010101  1114  P   3.2
2   20010101  1115  B       3.4
2   20010101  1116  P   3.1
2   20010101  1119  P   3.6

Output:

1       20010101        947     P       3.5     6       4
1       20010101        951     P       4       5       4
1       20010101        952     P       4       5       4
1       20010101        1011    P       4       4       4
2       20010101        1114    P       3.2     3       3.5
2       20010101        1116    P       3.1     3       3.4
2       20010101        1119    P       3.6     3       3.4

You might want to compensate for possible empty/undefined/old values in $A and $B.

1 Comment

Thank you! The if-elsif structure is simple and clear and I didn't even think it in that way!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.