3

After learning how to pass regexes as arguments, I've tried to build my first regex using a sub, and I'm stuck once more. Sorry for the complex rules below, I've made my best to simplify them. I need at least some clues how to approach this problem.

The regex should consist of alternations, each of them consisting of left, middle and right, where left and right should come in pairs and the variant of middle depends on which right is chosen.

An array of Pairs contains pairs of left and right:

my Pair @leftright =
  A => 'a',
  ...
  Z => 'z',
  ;

Middle variants are read from a hash:

my Regex %middle = 
  z => / foo /,
  a => / bar /,
  m => / twi /,
  r => / bin /,
  ...
  ;

%middle<z> should be chosen if right is z, %middle<a> — if right is a, etc.

So, the resulting regex should be

my token word {
    | A <%middle[a]> a
    | Z <%middle[z]> z
    | ...
}

or, more generally

my token word {
    | <left=@leftright[0].key> 
      <middle=%middle{@leftright[0].value}> 
      <right=@leftright[0].value> 
    | (the same for index == 1)
    | (the same for index == 2)
    | (the same for index == 3)
 ...
}

and it should match Abara and Zfooz.

How to build token word (which can be used e.g. in a grammar) with a sub that will take every pair from @leftright, put the suitable %middle{} depending on the value of right and then combine it all into one regex?

my Regex sub sub_word(Pair @l_r, Regex %m) {
...
}
my token word {
    <{sub_word(@leftright, %middle)}> 
}

After the match I need to know the values of left, middle, and right:

"Abara" ~~ &word;
say join '|', $<left>, $<middle>, $<right> # A|bar|a
6
  • It sounds like you're saying something to the effect that, given a right that's X, the middle is / waldo /. But you haven't said what programmatic relationship is supposed to be detected between X and waldo. (Granted, you wrote z not X and / zfoo / not / waldo / but that makes no difference, unless you are meaning that the z in / zfoo / isn't just to aid human understanding but is also to be detected by the program. In which case, no, I don't think you can do that -- I don't think your program can introspectively know that the / zfoo / pattern contains a z.) Commented Nov 11, 2017 at 3:25
  • What would be the input parameters to the sub that should build the token word? Should the sub be used before the regex parsing starts to build a predefined token, or should it be used within the regex parser? Can you give a simple example? Commented Nov 11, 2017 at 7:25
  • @raiph Thanks That's my fault (and I thought about it), I didn't mean introspection into a regex. I'll reformulate this part, making a hash. Commented Nov 11, 2017 at 7:35
  • @HåkonHægland The input parameters should be @leftright and %middle, and the pattern should be built before the parsing begins. If there were only one variant the pattern would be smth like <$left> <$middle> <$right>. Here it should be <@leftright[0].key> <%middle{@leftright[0].value}> <@leftright[0].value> | (the same for index == 1 | then 2, 3 etc)... So the problem is that I e.g. don't know how to concatenate regexes with alternation | in a loop. Commented Nov 11, 2017 at 7:46
  • 1
    I've reworked the text of the question. Hope, now it's better. Commented Nov 11, 2017 at 8:12

1 Answer 1

2

I was not able to do this using token yet, but here is a solution with EVAL and Regex (and also I am using %middle as a hash of Str and not a hash of Regex):

my Regex sub build_pattern (%middle, @leftrigth) {
    my $str = join '|', @leftright.map(
        {join ' ',"\$<left>='{$_.key}'", "\$<middle>='{%middle{$_.value}}'", "\$<right>='{$_.value}'"});
    );
    my Regex $regex = "rx/$str/".EVAL;

    return $regex;
}

my Regex $pat = build_pattern(%middle, @leftright);

say $pat;
my $res = "Abara" ~~ $pat;
say $res;

Output:

rx/$<left>='A' $<middle>='bar' $<right>='a'|$<left>='Z' $<middle>='foo' $<right>='z'/
「Abara」
 left => 「A」
 middle => 「bar」
 right => 「a」

For more information on why I chose to use EVAL, see How can I interpolate a variable into a Perl 6 regex?

Sign up to request clarification or add additional context in comments.

11 Comments

Thanks! Probably, that will be the most effective way, but in my case I need to have the possibility to analyze $/. E.g. I need to know which middle matched. I'll add this to the Q, sorry I haven't written it beforehand.
Then maybe put a capture group around the middle part? Like this: my $pat = rx/'A' ('bar') 'a'|'Z' ('foo') 'z'/. Then $0 will contain the middle part after a successful match..
Is it possible to do the same with a named (not positional) capture? That would be a very nice solution for me.
@EugeneBarsky Yes that is a good idea! See my updated answer for a suggestion
@EugeneBarsky We need to escape the $ because it is inside double quotes, and since we do not want it to be interpolated. I.e., the literal $ must survive the first join call. For the other question: the single quotes are there since we are not using sigspace in the regex, hence all literal strings should be in quotes. If not, we would get a warning when compiling the regex (here that is: when using EVAL).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.