1

Question abstract:

how to parse text file into two "hashes" in Perl. One store key-value pairs taken from the (X=Y) part, another from the (X:Y) part?

1=9  
2=2  
3=1  
4=6  
2:1  
3:1  
4:1  
1:2
1:3  
1:4  
3:4  
3:2

they are kept in one file, and only the symbol between the two digits denotes the difference.

===============================================================================

I just spent around 30 hours learning Perl during last semester and managed to finish my Perl assignment in an "head first, ad-hoc, ugly" way.

Just received my result for this section as 7/10, to be frank, I am not happy with this, particularly because it recalls my poor memory of trying to use Regular Expression to deal with formatted data, which rule is like this :

1= (the last digit in your student ID,or one if this digit is zero)  
2= (the second last digit in your student ID,or one if this digit is zero)
3= (the third last digit in your student ID, or one if this digit is zero)
4= (the forth last digit in your student ID, or one if this digit is zero)

2:1 
3:1  
4:1  
1:2  
1:3  
1:4  
2:3 (if the last digit in your student ID is between 0 and 4) OR
    3:4 (if the last digit in your student ID is between 5 and 9)
3:2 (if the second last digit in your student ID is between 0 and 4) OR
    4:3 (if the second last digit in your student ID is between 5 and 9)

An example of the above configuration file: if your student ID is 10926029, it has to be:

1=9  
2=2  
3=1  
4=6  
2:1  
3:1  
4:1  
1:2
1:3  
1:4  
3:4  
3:2

The assignment was about Pagerank calculation, which algorithm is simplified so I came up with the answer to that part in 5 minutes. However, it was the text parsing part that took me heaps of time.

The first part of the text (Page=Pagerank) denotes the pages and their corresponding pageranks.

The second part (FromNode:ToNode) denotes the direction of a link between two pages.

For a better understanding, please go to my website and check the requirement file and my Perl script here

There are massive comments in the script so I reckon it is not hard at all to see how stupid I was in my solution :(

If you are still on this page, let me justify why I ask this question here in SO:

I got nothing else but "Result 7/10" with no comment from uni.

I am not studying for uni, I am learning for myself.

So, I hope the Perl gurus can at least guide me the right direction toward solving this problem. My stupid solution was sort of "generic" and probable would work in Java, C#, etc. I am sure that is not even close to the nature of Perl.

And, if possible, please let me know the level of solution, like I need to go through "Learning Perl ==> Programming Perl ==> Master Perl" to get there :)

Thanks for any hint and suggestion in advance.

Edit 1:

I have another question posted but closed here, which describes pretty much like how things go in my uni :(

8
  • 2
    I have one serious suggestion: go to your professor and tell him or her that you would like to go over the problem a bit. Make it clear that you are not looking to change your grade, but only to understand the material better. Most teachers will have a hard time refusing that much. Commented Jul 20, 2010 at 23:38
  • @Telemachus: after years of studying in this uni, I am sure this sort of thing wouldn't happen unless I chase the prof. for a feedback in person. I know it was marked by someone else. They might recognized this as "issue in the past". I am so sick of this. And to be frank, I don't think they would show me a better solution :( Commented Jul 20, 2010 at 23:47
  • 3
    Your question is impenetrable. What are you asking? I don't want to spend 45 minutes just to figure out what you are asking for. I don't want to download unknown zip files from strange sites. If you want a response, write a short and clear question and provide a small bit of code. Commented Jul 21, 2010 at 0:14
  • @daotoad : question updated and abstracted to narrow down to a specific question, I cannot make a proper solution myself to that. Hope this address to your concern. The files uploaded are just as "proof" of what I've done. Commented Jul 21, 2010 at 0:28
  • I'm not sure about grade, but did you try running your code through perl critic? For the small bit of data you've posted, the zip file is way too much to wade through. Why not post a specific section of your code that you're concerned with? Commented Jul 21, 2010 at 1:04

2 Answers 2

3

Is this what you mean? The regex basically has three capture groups (denoted by the ()s). It should capture one digit, followed by either = or : (that's the capture group wrapping the character class [], which matches any character within it), followed by another single digit.

my ( %assign, %colon );

while (<DATA>) {
    chomp;                     
    my ($l, $c, $r) = $_ =~ m/(\d)([=:])(\d)/;

    if    ( q{=} eq $c ) { $assign{$l} = $r; }
    elsif ( q{:} eq $c ) { $colon{$l}  = $r; }
}        

__DATA__
1=9  
2=2  
3=1  
4=6  
2:1  
3:1  
4:1  
1:2
1:3  
1:4  
3:4  
3:2

As for the recommendation, grab a copy of Mastering Regular Expressions if you can. It's very...thorough.

Sign up to request clarification or add additional context in comments.

4 Comments

That's absolutely better than what I did. I was trying to do parsing and data validation in the same process and ended up with a messy script. Also, hanks for the recommended book.
I will recommend the book too, I have used it at work and it has been so much help, I am always recommending learning regex, it's so helpful in data parsing, I would add a response but it looks covered.
Your code drops the links from 1 to 2 and 3, retaining only the link from 1 to 4. A simple hash can't associate multiple values with a single key.
just implementing the poster's specifications... And of course a hash can associate multiple values with a single key: push @{$hash{single_key}} = 'one_of_many_values';. Actually, I just noticed you did exactly this in your answer.
1

Well, if you don't want to validate any restrictions on the data file, you can parse this data pretty easily. The main issue lies in selecting the appropriate structure to store your data.

use strict;
use warnings;

use IO::File;

my $file_path = shift;  # Take file from command line

my %page_rank;
my %links;

my $fh = IO::File->new( $file_path, '<' )
    or die "Error opening $file_path - $!\n";

while ( my $line = $fh->readline ) {
    chomp $line;

    next unless $line =~ /^(\d+)([=:])(\d+)$/; # skip invalid lines

    my $page      = $1;
    my $delimiter = $2; 
    my $value     = $3;


    if( $delimiter eq '=' ) {

        $page_rank{$page} = $value;
    }
    elsif( $delimiter eq ':' ) {

        $links{$page} = [] unless exists $links{$page};

        push @{ $links{$page} }, $value;
    }

}

use Data::Dumper;
print Dumper \%page_rank;
print Dumper \%links;

The main way that this code differs from Pedro Silva's is that mine is more verbose and it also handles multiple links from one page properly. For example, my code preserves all values for links from page 1. Pedro's code discards all but the last.

2 Comments

@daotoad : I like the use of modules. It was painful not to be allowed to use any of them in my original code...
IO::File is a core module and I used it to improve readability over the <> operator. There's no extra functionality that couldn't have been achieved with open and <>, just slightly cleaner (IMO) syntax.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.