Perl file parser for dynamic file

Question

I'm new with Perl and could really use some help making a file parser. The file is built up like this (X is a number that changes from file to file and provides the number of following lines that contains a column heading):

X,1,0,0,2,0,0,2,0,1,2,0,2,2,0,3,2,0,4,2,1,0,2,2,0,2,3,0,2,4,0,2,4,1,2,4,2,2,4,3,2,5,0,2,5,1,2,5,2,2,5,3,3,1,0,3
# Col_heading1
# Col_heading2
# Col_heading3 //Continues X rows
# Col_headingX 
# 2013 138 22:42:21 - Random text
# 2013 138 22:42:22 : Random text
# 2013 138 22:42:23 : Random text
2013 138 22:42:26, 10, 10, 10, 20, //continues X values
2013 138 22:42:27, 10, 10, 10, 20, 
2013 138 22:42:28, 10, 10, 10, 20, 
# 2013 138 22:42:31 - Random text
# 2013 138 22:42:32 : Random text
# 2013 138 22:42:33 - Event $eventname starting ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:35, 10, 10, 10, 20, 
2013 138 22:42:36, 10, 10, 10, 20, 
2013 138 22:42:37, 10, 10, 10, 20, 
2013 138 22:42:38, 10, 10, 10, 20, 
2013 138 22:42:39, 10, 10, 10, 20, 
# 2013 138 22:42:40 : Random text
2013 138 22:42:41, 10, 10, 10, 20, 
2013 138 22:42:42, 10, 10, 10, 20, 
# 2013 138 22:42:45 - Event $eventname ended ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:46, 10, 10, 10, 20, 
2013 138 22:42:47, 10, 10, 10, 20, 
# 2013 138 22:42:48 : Random text

The parser needs to transpose Col_headings to tab separated values on one line, and list all lines between # 2013 138 22:42:33 - Event $eventname starting ($eventid) and # 2013 138 22:42:45 - Event $eventname ended ($eventid) that does not start with a #. The values must also be changed from comma separated to tab separated.

The output file should then look like:

Filename:/home/..../filename    What:$eventname Where:SYSTEM    ID:$eventid
Time                Col_heading1    Col_heading2    Col_heading3    Col_headingX
2013 138 22:42:35   10              10              10              20
2013 138 22:42:36   10              10              10              20
2013 138 22:42:37   10              10              10              20
2013 138 22:42:38   10              10              10              20
2013 138 22:42:39   10              10              10              20 
2013 138 22:42:41   10              10              10              20 
2013 138 22:42:42   10              10              10              20

Any help with this would be very much appreciated!

RobEarl · Accepted Answer · 2013-10-02 09:17:15Z

1

Once you've opened the file you can get the number from the first line with:

my ($heading_count) = split /,/, <$fh>;

Then loop to get the headings:

my @headings = qw(Time);
for (1..$heading_count) {
    chomp(my $heading = <$fh>); # Chomp to remove the newline
    # Process it somehow, e.g. remove leading # + whitespace
    $heading =~ s/^#\s+//;
    push @headings, $heading;
}

Once you've done that, loop through the rest of the file, parsing and printing any rows between the start/end patterns. Here is a fairly simplistic example to get you started:

print join "\t", @headings, "\n"; # print out the headings
my $in_event = 0; # State variable to track if we're in an event
while(<DATA>) {
    if (/Event (.*) starting \((.*)\)/) { # Watch for the event starting, event name is now in $1, event id in $2
        $in_event = 1;
        next;
    }
    next unless $in_event; # Skip if not in an event yet
    last if /Event .* ended/; # Stop reading if the event ends
    next if /^#/; # Skip comments

    s/,\s?/\t/g; # Replace commas with tabs
    print; # Print the row
}

You'll find using this approach the column headings don't line up properly with the data due to the variable lengths so you'll either need to tweak it to get exactly what is required or look into Text::CSV for parsing the rows (or use split) and something like Text::Table to produce a proper table.

edited Oct 2, 2013 at 9:17

answered Oct 2, 2013 at 8:21

RobEarl

7,9226 gold badges38 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

RobEarl Over a year ago

use open, examples on that page.

user2837756 Over a year ago

My input file has ^M on the tail of each line. Is it possible to remove this at the same time as the # and whitespace?

RobEarl Over a year ago

See here: stackoverflow.com/questions/7175977/…

Collectives™ on Stack Overflow

Perl file parser for dynamic file

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related