0

In an earlier thread about inserting brackets around "comments" in a chess pgn-like string, I got excellent help finishing a regex that matches move lists and comments separately.

Here is the current regex:

((?:\s?[\(\)]?\s?[\(\)]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}(?:\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})?\s?[()]?\s?[()]?\s?)+)|((?:(?!\s?[\(\)]?\s?[\(\)]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}).)+)

The three capture groups are:

  1. "e4 e5 2. f4 exf4 3.Nf3" etc -- i.e. lists of moves
  2. "Blah blah blah" -- i.e. "comments"
  3. comment ") (" comment -- i.e. close and begin parens, when a chess variation with a comment at the end "completes", and another chess variation with a comment at the beginning "starts"

In action here: http://regex101.com/r/dQ9lY5

Everything works correctly for "Your regular expression in" PCRE(PHP): it matches all three groups correctly. When I switch to "Your regular expression in" Javascript, however, it matches everything as Capture Group 1. Is there something in my regex that isn't supported by the Javascript regex engine? I tried to research this, but haven't been able to solve it. There is so much information on this topic, and I've already spent hours and hours.

I know one solution is to use the regex as-is, and pass it to PHP through AJAX, etc, but I don't know how to do that yet (it's on my list to learn).

Question 1: But I am also very curious about what it is in this regex that doesn't work on the Javascript regex engine.

Also, here is my Javascript CleanPgnText function. I am most interested in the while, but if anything else seems wrong, I would appreciate any help.

function CleanPgnText(pgn) {
  var pgnTextEdited = '';
  var str;
  var pgnInputTextArea = document.getElementById("pgnTextArea");
  var pgnOutputArea = document.getElementById("pgnOutputText");
  str = pgnInputTextArea.value;
  str = str.replace(/\[/g,"(");     //sometimes he uses [ incorrectly for variations
  str = str.replace(/\]/g,")"); 
  str = str.replace(/[\n¬]*/g,"");  // remove newlines and that weird character that MS Word sticks in
  str = str.replace(/\s{2,}/g," "); // turn more than one space into one space

  while ( str =~ /((?:\s?[\(\)]?\s?[\(\)]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}(?:\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})?\s?[()]?\s?[()]?\s?)+)|((?:(?!\s?[\(\)]?\s?[\(\)]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})[^\)\(])+)|((?:\)\s\())/g ) {
    if ($1.length > 0) {  //
      pgnTextEdited += $1;
    }
    else if ($2.length > 0) {
      pgnTextEdited += '{' + $2 + '}';
    }
    else if ($3.length > 0) {
      pgnTextEdited += $3;
    }
  }

  pgnOutputArea.innerHTML = pgnTextEdited;
}

Question 2: Regarding the =~ in the while statement

while ( str =~

I got the =~ from helpful code in my original thread, but it was written in Perl. I don't quite understand how the =~ operator works. Can I use this same operator in Javascript, or should I be using something else?

Question 3: Can I use .length the way I am, when I say

if ($1.length > 0) 

to see if the first capture group had a match?

Thank you in advance for any help. (If the regex101 link doesn't work for you, you can get a sample pgn to test on from the original thread).

2 Answers 2

1

I corrected your javascript code and got the following:

http://jsfiddle.net/ZXG2H/

  1. Personally I think the matching (group) problems are related to http://regex101.com/. Your expression works definitly in JavaScript (see the fiddle) and in Java (with escaping corrections). I minimalized your JavaScript slightly and used the pgn data from a parameter not a text input.

  2. I am not aware that =~ is available in JavaScript, but maybe I am wrong. Using JavaScript you loop through the matches using something like: (Why does it not format like code???)

    pattern=/myregexp/; while ((match=pattern.exec(mytext))!=null) { //do something }

  3. If no match is found for a group it returns null. You adress the groups by using the match variable from above with an index like match[2] is matching group 2.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much! Works perfectly. Glad to know that the problem was not the regex, but was instead (as I kind of suspected) my bad while statement.
0

I was looking at your new regex, its not quite right. Even though it looks to work with @wumpz 's JS code,
You can't just exclude [^)(] parenth's in the comment's section, because you are
only matching a string literal ) ( sequence (in capture group 3).
This could potentially exclude parenths from a match, where it doesn't become part of the newstring
that is constructed. Its not likely because the moves matches parenths.

To fix that, just exclude ') (`'s from comments, then match it first (group 1).
Also, I left some notes of the changes made from your new regex.
Try it out. I think @wumpz deserves the credit.

    #  /(\)\s*\()|((?:\s?[()]?\s?[()]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}(?:\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})?\s?[()]?\s?[()]?\s?)+)|((?:(?!\s?[()]?\s?[()]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-])(?!\)\s*\()[\S\s])+)/


    ( \) \s* \( )              # (1), 'Special Comment' configuration (must match first)
 |                           # OR,
    (                          # (2 start), 'Moves' configuration
         (?:
              \s? 
              [()]? \s? [()]? 
              \s? 
              [0-9]{1,3} \.{1,3}
              \s 
              [NBRQK]? [a-h1-8]? x? [a-hO] [1-8-] [O-]{0,3} [!?+#=]{0,2} [NBRQ]? 
              [!?+#]{0,2} 
              (?:
                   \s 
                   [NBRQK]? [a-h1-8]? x? [a-hO] [1-8-] [O-]{0,3} [!?+#=]{0,2} [NBRQ]? [!?+#]{0,2} 
              )?
              \s? 
              [()]? \s? [()]? 
              \s? 
         )+
    )                          # (2 end)
 |                           # OR,  
    (                          # (3 start), 'Normal Comment' configuration
         (?:
              (?!                        # Not the 'Moves configuration'
                   \s? 
                   [()]? \s? [()]? 
                   \s? 
                   [0-9]{1,3} \.{1,3}
                   \s 
                   [NBRQK]? [a-h1-8]? x? [a-hO] [1-8-] 

                   # ---- 
                   # Next line is not needed
                   # because all its items are
                   # optional
                   # ---- 
                   ### [O-]{0,3} [!?+#=]{0,2} [NBRQ]? [!?+#]{0,2}  <-  not needed
              )
              ### [^)(]    <- replaced by   '[\S\s]'  below
              # ---- 
              # The above line is replaced by any char.
              # because it excludes all ()'s and is not appropriate

              (?! \) \s* \( )            # Also, Not the 'Sspecial comment' configuration

              [\S\s]                     # Consume any char
         )+
    )                          # (3 end)

Modifing @wumpz JS code, it would look like this with modified regex

 function CleanPgnText(pgn) {
     var pgnTextEdited = '';
     var str;
     var pgnOutputArea = document.getElementById("pgnOutputText");
     str = pgn;
     str = str.replace(/\[/g, "("); //sometimes he uses [ incorrectly for variations
     str = str.replace(/\]/g, ")");
     str = str.replace(/[\n¬]*/g, ""); // remove newlines and that weird character that MS Word sticks in
     str = str.replace(/\s{2,}/g, " "); // turn more than one space into one space

     //Start regexp processing
     var pattern = /(\)\s*\()|((?:\s?[()]?\s?[()]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2}(?:\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-][O-]{0,3}[!?+#=]{0,2}[NBRQ]?[!?+#]{0,2})?\s?[()]?\s?[()]?\s?)+)|((?:(?!\s?[()]?\s?[()]?\s?[0-9]{1,3}\.{1,3}\s[NBRQK]?[a-h1-8]?x?[a-hO][1-8-])(?!\)\s*\()[\S\s])+)/g;

     while ((match = pattern.exec(str)) != null) {
         if (match[1] != null) {           // Special Comment configuration, don't add '{}'
             pgnTextEdited += match[1];
         } else if (match[2] != null) {    // Moves configuration  
             pgnTextEdited += match[2];
         } else if (match[3] != null) {    // Normal Comment configuration, add '{}'
             pgnTextEdited += '{' + match[3] + '}';
         }
     }
     //end regexp processing

     pgnOutputArea.innerHTML = pgnTextEdited;
 }

Running this in a Perl program, the output is:

{Khabarovsk is the capital of Far East of Russia. My 16-year-old opponent was a promising local prodigy. Now he is a very strong FM with a FIDE rating of 2437 and lives... in the USA, too! A small world.} 1. e4 c5 2. Nf3 e6 3. c3 Nf6 4. e5 Nd5 5. d4 cxd4 6. cxd4 d6 7. Nc3 Nc6 8. Bd3!? Nxc3 9. bxc3 dxe5 10. dxe5 Qa5 11. O-O Be7 12. Qb3 Nxe5 13. Nxe5 Qxe5 14. Bb5+ Kf8 15. Ba3 Qc7 16. Rad1 g6 17. c4! Bxa3 18. Qxa3+ Kg7 19. Rd6 Rd8 20. c5 Bd7 21. Bc4 Bc6 22. Rfd1 Rd7 23. Qg3 Rad8 {Finally with accurate, solid play Black has consolidated yet White still keeps some pressure and has some compensation for the pawn.} 24. h4 {A typical march in such positions, simply nothing else to do better.} 24... h5?! ( 24... h6 {would be a more careful response. }) ({ But the best defense was} 24... Rd6! 25. cd6 Qa5 ) 25. Qe5+ Kh7 26. Bd3 {Very natural} 26... Kh6? ( {Missing} 26... Ba4! 27. Qxh5+ Kg7 28. Qe5+ Kg8! {and now Black has many own threats. White would have to force a perpetual after} 29. h5! Bxd1 30. h6 f6 31. Qxf6 Bh5 32. Qxe6+ Kh7 33. Bxg6+ Bxg6 34. Qxg6+ Kh8 35. Qf6+ {Now, after 26...Kh6 everything is ready for preparing a decisive blow.} ) 27. Qf6! Kh7 ( {There is no} 27... Rxd6 28. cxd6 Rxd6? {due to} 29. Qh8# ) 28. g4! hxg4 29. h5 Rxd6 30. cxd6 Rxd6 31. hxg6+ Kg8 32. g7! {This pawn is the vital factor until the end now. With any other move, White loses.} 32... Qd8! {The only defense against Qh6 and Qh8 checkmating or queening.} 33. Qh6 f5 34. Rd2!! {The idea is the white rook cannot be taken with a check anymore. The bishop will be easily unpinned with the crushing Bxf5 or Bc4. The Black pin on d file was an illusion! In fact it's Black's rook that is pinned and cannot leave d file.} 34... Bd5 ( {The best try - to close d file with protecting more e6 pawn. No help is} 34... Rd7 35. Bf5 ef5 36. Qh8 Kf7 37. Rd7 ) ( {But maybe the best practical chance was} 34... g3!? {and now} 35. Bxf5 {doesn't win because of} 35... gxf2+ 36. Kh2 f1=N+! 37. Kh3 Bg2+! 38. Rxg2 Rd3+! 39. Bxd3 Qxd3+ {with an amazing perpetual} 40. Kh4 Qe4+ 41. Rg4 Qh1+ 42. Kg5 Qd5+ 43. Kf6 Qd8+ 44. Kg6 Qd3+ ) ( {But after} 34... g3!? {White wins using another wing tactic:} 35. Bc4! Bd5 36. Bxd5 exd5 37. Qh8+ Kf7 38. Rc2 gxf2+ 39. Kf1! {and there is no defense against Rc8. Now after 35...Bd5 again everything looks well protected.} ) 35. Qh8 Kf7 36. Bb5! {The bishop still makes his way breaking through. The coming Be8 is a killer.} 36... Qg8 37. Be8+! Qxe8 38. Qe8+ Kxe8 39. g8=Q+ Kd7 40. Qg7+ {It was White's 40th move Which means time control was over for me. I was short on time. A piece and three pawns for a queen is not enough. Black resigned. 1-0 }

1 Comment

aha, that makes sense. I will make the change in my code :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.