Search for String in File Using Regular Expression Perl

Question

I'm new to Perl, I'm reading text from a file and want to REPLACE some words with their translation in French. I managed to get word by word, but not by expression/string, I'm having problems getting it code wise.

Code for word by word:

my $filename = 'assign3.txt';
my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");   
my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");
my $i=1;
open(my $fh, '<:encoding(UTF-8)', $filename)
    or die "Could not open file $filename !";
while (<$fh>) {
    for my $word (split)
    {
        print " $i. $word \n"; 
        $i++;
        for (my $j=0; $j < 9;$j++){
            if ($word eq $lexicon_en[$j]){
            print "Found one! - j value is $j\n";
            }
        }
     }
}
print "\ndone here!!\n";

Here is the regular expression I'm trying to use:

    /\w+\s\w+/

This is my code for strings:

while (<>) {
        print ("this is text: $_ \n");

        if ((split (/Due\sDate/),$_) eq "Due Date"){
            print "yes!!\n";
        }
}

Can you give some example output for what you are exactly looking for. so that i can send you a sample script. — Fla-Hyd
– Fla-Hyd, Commented Nov 16, 2014 at 3:28

Steven Klassen · Accepted Answer · 2014-11-16 04:30:55Z

2

I think I understand the challenge you're having. Because "due date" is two words you need it to match before "due" matches otherwise you get several incorrect translations. One way to deal with that would be to order your matches by the largest number of words to the fewest so that "due date" is dealt with before "due".

If you convert your arrays to a hash (dictionary) you can order the keys based on the number of spaces and then iterate over them to do the actual substitutions:

#!/usr/bin/perl
use strict;
use warnings;

#my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");
#my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");

# convert your arrays to a hash
my %lexicon = (
    'Winter' => 'Hiver',
    'Date' => 'Date',
    'Due Date' => 'Date de Remise',
    'Problem' => 'Problème',
    'Summer' => 'Été',
    'Mark' => 'Point',
    'Fall' => 'Automne',
    'Assignment' => 'Devoir',
    'November' => 'Novembre',
);

# sort the keys on the number of spaces found
my @ordered_keys = sort { ($a =~ / /g) < ($b =~ / /g) } keys %lexicon;

my $sample = 'The due date of the assignment is a date in the fall.';

print "sample before: $sample\n";

foreach my $key (@ordered_keys) {
    $sample =~ s/${key}/${lexicon{${key}}}/ig;
}

print "sample after : $sample\n";

The output:

sample before: The due date of the assignment is a date in the fall.
sample after : The Date de Remise of the Devoir is a Date in the Automne.

The next challenge is going to be ensuring that the case of the replacement matches what's being replaced.

answered Nov 16, 2014 at 4:30

Steven Klassen

336 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user3241846 Over a year ago

Your code is great, can you explain more what the case replacement challenge is? Thanks

Steven Klassen Over a year ago

@julian-ladisch already took care of it with his code sample. He added the lowercase equivalents of the pairs so that if "due date" should appear instead of "Due Date" it'll be exchanged with the same case replacement.

Julian Ladisch · Accepted Answer · 2014-11-16 19:41:11Z

Use \b to detect word boundary instead of \w to detect whitespace.

Combine the solution of Steven Klassen with How to replace a set of search/replace pairs?

#!/usr/bin/perl
use strict;
use warnings;

my %lexicon = (
    'Winter' => 'Hiver',
    'Date' => 'Date',
    'Due Date' => 'Date de Remise',
    'Problem' => 'Problème',
    'Summer' => 'Été',
    'Mark' => 'Point',
    'Fall' => 'Automne',
    'Assignment' => 'Devoir',
    'November' => 'Novembre',
);

# add lowercase
for (keys %lexicon) {
    $lexicon{lc($_)} = lc($lexicon{$_});
    print $_ . " " . $lexicon{lc($_)} . "\n";
}

# Combine to one big regexp.
# https://stackoverflow.com/questions/17596917/how-to-replace-a-set-of-search-replace-pairs?answertab=votes#tab-top
my $regexp = join '|', map { "\\b$_\\b" } keys %lexicon;

my $sample = 'The due date of the assignment is a date in the fall.';
print "sample before: $sample\n";
$sample =~ s/($regexp)/$lexicon{$1}/g;
print "sample after : $sample\n";

Collectives™ on Stack Overflow

Search for String in File Using Regular Expression Perl

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related