Building a regex with sub in Perl 6

Question

After learning how to pass regexes as arguments, I've tried to build my first regex using a sub, and I'm stuck once more. Sorry for the complex rules below, I've made my best to simplify them. I need at least some clues how to approach this problem.

The regex should consist of alternations, each of them consisting of left, middle and right, where left and right should come in pairs and the variant of middle depends on which right is chosen.

An array of Pairs contains pairs of left and right:

my Pair @leftright =
  A => 'a',
  ...
  Z => 'z',
  ;

Middle variants are read from a hash:

my Regex %middle = 
  z => / foo /,
  a => / bar /,
  m => / twi /,
  r => / bin /,
  ...
  ;

%middle<z> should be chosen if right is z, %middle<a> — if right is a, etc.

So, the resulting regex should be

my token word {
    | A <%middle[a]> a
    | Z <%middle[z]> z
    | ...
}

or, more generally

my token word {
    | <left=@leftright[0].key> 
      <middle=%middle{@leftright[0].value}> 
      <right=@leftright[0].value> 
    | (the same for index == 1)
    | (the same for index == 2)
    | (the same for index == 3)
 ...
}

and it should match Abara and Zfooz.

How to build token word (which can be used e.g. in a grammar) with a sub that will take every pair from @leftright, put the suitable %middle{} depending on the value of right and then combine it all into one regex?

my Regex sub sub_word(Pair @l_r, Regex %m) {
...
}
my token word {
    <{sub_word(@leftright, %middle)}> 
}

After the match I need to know the values of left, middle, and right:

"Abara" ~~ &word;
say join '|', $<left>, $<middle>, $<right> # A|bar|a

It sounds like you're saying something to the effect that, given a right that's X, the middle is / waldo /. But you haven't said what programmatic relationship is supposed to be detected between X and waldo. (Granted, you wrote z not X and / zfoo / not / waldo / but that makes no difference, unless you are meaning that the z in / zfoo / isn't just to aid human understanding but is also to be detected by the program. In which case, no, I don't think you can do that -- I don't think your program can introspectively know that the / zfoo / pattern contains a z.) — raiph
– raiph, Commented Nov 11, 2017 at 3:25
What would be the input parameters to the sub that should build the token word? Should the sub be used before the regex parsing starts to build a predefined token, or should it be used within the regex parser? Can you give a simple example? — Håkon Hægland
– Håkon Hægland, Commented Nov 11, 2017 at 7:25
@raiph Thanks That's my fault (and I thought about it), I didn't mean introspection into a regex. I'll reformulate this part, making a hash. — Eugene Barsky
– Eugene Barsky, Commented Nov 11, 2017 at 7:35
@HåkonHægland The input parameters should be @leftright and %middle, and the pattern should be built before the parsing begins. If there were only one variant the pattern would be smth like <$left> <$middle> <$right>. Here it should be <@leftright[0].key> <%middle{@leftright[0].value}> <@leftright[0].value> | (the same for index == 1 | then 2, 3 etc)... So the problem is that I e.g. don't know how to concatenate regexes with alternation | in a loop. — Eugene Barsky
– Eugene Barsky, Commented Nov 11, 2017 at 7:46
I've reworked the text of the question. Hope, now it's better. — Eugene Barsky
– Eugene Barsky, Commented Nov 11, 2017 at 8:12

Håkon Hægland · Accepted Answer · 2017-11-11 09:30:18Z

2

I was not able to do this using token yet, but here is a solution with EVAL and Regex (and also I am using %middle as a hash of Str and not a hash of Regex):

my Regex sub build_pattern (%middle, @leftrigth) {
    my $str = join '|', @leftright.map(
        {join ' ',"\$<left>='{$_.key}'", "\$<middle>='{%middle{$_.value}}'", "\$<right>='{$_.value}'"});
    );
    my Regex $regex = "rx/$str/".EVAL;

    return $regex;
}

my Regex $pat = build_pattern(%middle, @leftright);

say $pat;
my $res = "Abara" ~~ $pat;
say $res;

Output:

rx/$<left>='A' $<middle>='bar' $<right>='a'|$<left>='Z' $<middle>='foo' $<right>='z'/
｢Abara｣
 left => ｢A｣
 middle => ｢bar｣
 right => ｢a｣

For more information on why I chose to use EVAL, see How can I interpolate a variable into a Perl 6 regex?

edited Nov 11, 2017 at 9:30

answered Nov 11, 2017 at 8:47

Håkon Hægland

40.9k22 gold badges96 silver badges210 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Eugene Barsky Over a year ago

Thanks! Probably, that will be the most effective way, but in my case I need to have the possibility to analyze $/. E.g. I need to know which middle matched. I'll add this to the Q, sorry I haven't written it beforehand.

Håkon Hægland Over a year ago

Then maybe put a capture group around the middle part? Like this: my $pat = rx/'A' ('bar') 'a'|'Z' ('foo') 'z'/. Then $0 will contain the middle part after a successful match..

Eugene Barsky Over a year ago

Is it possible to do the same with a named (not positional) capture? That would be a very nice solution for me.

Håkon Hægland Over a year ago

@EugeneBarsky Yes that is a good idea! See my updated answer for a suggestion

Håkon Hægland Over a year ago

@EugeneBarsky We need to escape the $ because it is inside double quotes, and since we do not want it to be interpolated. I.e., the literal $ must survive the first join call. For the other question: the single quotes are there since we are not using sigspace in the regex, hence all literal strings should be in quotes. If not, we would get a warning when compiling the regex (here that is: when using EVAL).

|

Collectives™ on Stack Overflow

Building a regex with sub in Perl 6

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related