How to use Perl to parse specified formatted text with regex?

Question

Question abstract:

how to parse text file into two "hashes" in Perl. One store key-value pairs taken from the (X=Y) part, another from the (X:Y) part?

they are kept in one file, and only the symbol between the two digits denotes the difference.

===============================================================================

I just spent around 30 hours learning Perl during last semester and managed to finish my Perl assignment in an "head first, ad-hoc, ugly" way.

Just received my result for this section as 7/10, to be frank, I am not happy with this, particularly because it recalls my poor memory of trying to use Regular Expression to deal with formatted data, which rule is like this :

1= (the last digit in your student ID,or one if this digit is zero)  
2= (the second last digit in your student ID,or one if this digit is zero)
3= (the third last digit in your student ID, or one if this digit is zero)
4= (the forth last digit in your student ID, or one if this digit is zero)

2:1 
3:1  
4:1  
1:2  
1:3  
1:4  
2:3 (if the last digit in your student ID is between 0 and 4) OR
    3:4 (if the last digit in your student ID is between 5 and 9)
3:2 (if the second last digit in your student ID is between 0 and 4) OR
    4:3 (if the second last digit in your student ID is between 5 and 9)

An example of the above configuration file: if your student ID is 10926029, it has to be:

1=9  
2=2  
3=1  
4=6  
2:1  
3:1  
4:1  
1:2
1:3  
1:4  
3:4  
3:2

The assignment was about Pagerank calculation, which algorithm is simplified so I came up with the answer to that part in 5 minutes. However, it was the text parsing part that took me heaps of time.

The first part of the text (Page=Pagerank) denotes the pages and their corresponding pageranks.

The second part (FromNode:ToNode) denotes the direction of a link between two pages.

For a better understanding, please go to my website and check the requirement file and my Perl script here

There are massive comments in the script so I reckon it is not hard at all to see how stupid I was in my solution :(

If you are still on this page, let me justify why I ask this question here in SO:

I got nothing else but "Result 7/10" with no comment from uni.

I am not studying for uni, I am learning for myself.

So, I hope the Perl gurus can at least guide me the right direction toward solving this problem. My stupid solution was sort of "generic" and probable would work in Java, C#, etc. I am sure that is not even close to the nature of Perl.

And, if possible, please let me know the level of solution, like I need to go through "Learning Perl ==> Programming Perl ==> Master Perl" to get there :)

Thanks for any hint and suggestion in advance.

Edit 1:

I have another question posted but closed here, which describes pretty much like how things go in my uni :(

I have one serious suggestion: go to your professor and tell him or her that you would like to go over the problem a bit. Make it clear that you are not looking to change your grade, but only to understand the material better. Most teachers will have a hard time refusing that much. — Telemachus
– Telemachus, Commented Jul 20, 2010 at 23:38
@Telemachus: after years of studying in this uni, I am sure this sort of thing wouldn't happen unless I chase the prof. for a feedback in person. I know it was marked by someone else. They might recognized this as "issue in the past". I am so sick of this. And to be frank, I don't think they would show me a better solution :( — Michael Mao
– Michael Mao, Commented Jul 20, 2010 at 23:47
Your question is impenetrable. What are you asking? I don't want to spend 45 minutes just to figure out what you are asking for. I don't want to download unknown zip files from strange sites. If you want a response, write a short and clear question and provide a small bit of code. — daotoad
– daotoad, Commented Jul 21, 2010 at 0:14
@daotoad : question updated and abstracted to narrow down to a specific question, I cannot make a proper solution myself to that. Hope this address to your concern. The files uploaded are just as "proof" of what I've done. — Michael Mao
– Michael Mao, Commented Jul 21, 2010 at 0:28
I'm not sure about grade, but did you try running your code through perl critic? For the small bit of data you've posted, the zip file is way too much to wade through. Why not post a specific section of your code that you're concerned with? — Scott Hoffman
– Scott Hoffman, Commented Jul 21, 2010 at 1:04

Pedro Silva · Accepted Answer · 2010-07-21 00:50:47Z

3

Is this what you mean? The regex basically has three capture groups (denoted by the ()s). It should capture one digit, followed by either = or : (that's the capture group wrapping the character class [], which matches any character within it), followed by another single digit.

my ( %assign, %colon );

while (<DATA>) {
    chomp;                     
    my ($l, $c, $r) = $_ =~ m/(\d)([=:])(\d)/;

    if    ( q{=} eq $c ) { $assign{$l} = $r; }
    elsif ( q{:} eq $c ) { $colon{$l}  = $r; }
}        

__DATA__
1=9  
2=2  
3=1  
4=6  
2:1  
3:1  
4:1  
1:2
1:3  
1:4  
3:4  
3:2

As for the recommendation, grab a copy of Mastering Regular Expressions if you can. It's very...thorough.

edited Jul 21, 2010 at 0:50

answered Jul 21, 2010 at 0:44

Pedro Silva

4,7101 gold badge23 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Michael Mao Over a year ago

That's absolutely better than what I did. I was trying to do parsing and data validation in the same process and ended up with a messy script. Also, hanks for the recommended book.

onaclov2000 Over a year ago

I will recommend the book too, I have used it at work and it has been so much help, I am always recommending learning regex, it's so helpful in data parsing, I would add a response but it looks covered.

daotoad Over a year ago

Your code drops the links from 1 to 2 and 3, retaining only the link from 1 to 4. A simple hash can't associate multiple values with a single key.

Pedro Silva Over a year ago

just implementing the poster's specifications... And of course a hash can associate multiple values with a single key: push @{$hash{single_key}} = 'one_of_many_values';. Actually, I just noticed you did exactly this in your answer.

daotoad · Accepted Answer · 2010-07-21 18:08:24Z

1

Well, if you don't want to validate any restrictions on the data file, you can parse this data pretty easily. The main issue lies in selecting the appropriate structure to store your data.

use strict;
use warnings;

use IO::File;

my $file_path = shift;  # Take file from command line

my %page_rank;
my %links;

my $fh = IO::File->new( $file_path, '<' )
    or die "Error opening $file_path - $!\n";

while ( my $line = $fh->readline ) {
    chomp $line;

    next unless $line =~ /^(\d+)([=:])(\d+)$/; # skip invalid lines

    my $page      = $1;
    my $delimiter = $2; 
    my $value     = $3;


    if( $delimiter eq '=' ) {

        $page_rank{$page} = $value;
    }
    elsif( $delimiter eq ':' ) {

        $links{$page} = [] unless exists $links{$page};

        push @{ $links{$page} }, $value;
    }

}

use Data::Dumper;
print Dumper \%page_rank;
print Dumper \%links;

The main way that this code differs from Pedro Silva's is that mine is more verbose and it also handles multiple links from one page properly. For example, my code preserves all values for links from page 1. Pedro's code discards all but the last.

answered Jul 21, 2010 at 18:08

daotoad

27.3k7 gold badges63 silver badges101 bronze badges

2 Comments

Michael Mao Over a year ago

@daotoad : I like the use of modules. It was painful not to be allowed to use any of them in my original code...

daotoad Over a year ago

IO::File is a core module and I used it to improve readability over the <> operator. There's no extra functionality that couldn't have been achieved with open and <>, just slightly cleaner (IMO) syntax.

Collectives™ on Stack Overflow

How to use Perl to parse specified formatted text with regex?

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related