0

I'm new to Perl, I'm reading text from a file and want to REPLACE some words with their translation in French. I managed to get word by word, but not by expression/string, I'm having problems getting it code wise.

Code for word by word:

my $filename = 'assign3.txt';
my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");   
my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");
my $i=1;
open(my $fh, '<:encoding(UTF-8)', $filename)
    or die "Could not open file $filename !";
while (<$fh>) {
    for my $word (split)
    {
        print " $i. $word \n"; 
        $i++;
        for (my $j=0; $j < 9;$j++){
            if ($word eq $lexicon_en[$j]){
            print "Found one! - j value is $j\n";
            }
        }
     }
}
print "\ndone here!!\n";

Here is the regular expression I'm trying to use:

    /\w+\s\w+/

This is my code for strings:

while (<>) {
        print ("this is text: $_ \n");

        if ((split (/Due\sDate/),$_) eq "Due Date"){
            print "yes!!\n";
        }
}
1
  • Can you give some example output for what you are exactly looking for. so that i can send you a sample script. Commented Nov 16, 2014 at 3:28

2 Answers 2

2

I think I understand the challenge you're having. Because "due date" is two words you need it to match before "due" matches otherwise you get several incorrect translations. One way to deal with that would be to order your matches by the largest number of words to the fewest so that "due date" is dealt with before "due".

If you convert your arrays to a hash (dictionary) you can order the keys based on the number of spaces and then iterate over them to do the actual substitutions:

#!/usr/bin/perl
use strict;
use warnings;

#my @lexicon_en = ("Winter","Date", "Due Date", "Problem", "Summer","Mark","Fall","Assignment","November");
#my @lexicon_fr = ("Hiver", "Date", "Date de Remise","Problème","Été", "Point", "Automne", "Devoir", "Novembre");

# convert your arrays to a hash
my %lexicon = (
    'Winter' => 'Hiver',
    'Date' => 'Date',
    'Due Date' => 'Date de Remise',
    'Problem' => 'Problème',
    'Summer' => 'Été',
    'Mark' => 'Point',
    'Fall' => 'Automne',
    'Assignment' => 'Devoir',
    'November' => 'Novembre',
);

# sort the keys on the number of spaces found
my @ordered_keys = sort { ($a =~ / /g) < ($b =~ / /g) } keys %lexicon;

my $sample = 'The due date of the assignment is a date in the fall.';

print "sample before: $sample\n";

foreach my $key (@ordered_keys) {
    $sample =~ s/${key}/${lexicon{${key}}}/ig;
}

print "sample after : $sample\n";

The output:

sample before: The due date of the assignment is a date in the fall.
sample after : The Date de Remise of the Devoir is a Date in the Automne.

The next challenge is going to be ensuring that the case of the replacement matches what's being replaced.

Sign up to request clarification or add additional context in comments.

2 Comments

Your code is great, can you explain more what the case replacement challenge is? Thanks
@julian-ladisch already took care of it with his code sample. He added the lowercase equivalents of the pairs so that if "due date" should appear instead of "Due Date" it'll be exchanged with the same case replacement.
1

Use \b to detect word boundary instead of \w to detect whitespace.

Combine the solution of Steven Klassen with How to replace a set of search/replace pairs?

#!/usr/bin/perl
use strict;
use warnings;

my %lexicon = (
    'Winter' => 'Hiver',
    'Date' => 'Date',
    'Due Date' => 'Date de Remise',
    'Problem' => 'Problème',
    'Summer' => 'Été',
    'Mark' => 'Point',
    'Fall' => 'Automne',
    'Assignment' => 'Devoir',
    'November' => 'Novembre',
);

# add lowercase
for (keys %lexicon) {
    $lexicon{lc($_)} = lc($lexicon{$_});
    print $_ . " " . $lexicon{lc($_)} . "\n";
}

# Combine to one big regexp.
# https://stackoverflow.com/questions/17596917/how-to-replace-a-set-of-search-replace-pairs?answertab=votes#tab-top
my $regexp = join '|', map { "\\b$_\\b" } keys %lexicon;

my $sample = 'The due date of the assignment is a date in the fall.';
print "sample before: $sample\n";
$sample =~ s/($regexp)/$lexicon{$1}/g;
print "sample after : $sample\n";

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.