0

I want to recognize football matches with Regex in JavaScript:

1
15/06 16:00
Brasília

Brasilien
3:0 (1:0)
Japan

2
23/06 16:00
Recife 

Uruguay
-
Tahiti

This text contains:

  • Date and Time of the match
  • The Place where the match is
  • The two teams
  • the score if the game is already played OR if not it contains a "-"

i have build a regex with http://regex101.com/ site:

(\d\d\/\d\d)\s(\d\d:\d\d)\s(.+)\s\s\s(.+)\s(?:-|(\d):(\d)\s\(\d:\d\))\s(.+)

This regex is should capture both alternatives(with score and without) Here is a link to the whole testing stuff: http://regex101.com/r/bF3lU4

My Code in JavaScript with NodeJS:

function CreateMatchesFromString(data)
{
    var re = /(\d\d\/\d\d)\s(\d\d:\d\d)\s(.+)\s\s\s(.+)\s(?:-|(\d):(\d)\s\(\d:\d\))\s(.+)/g;
    var myArray;

    while ((myArray = re.exec(data)) !== null)
    {
        console.log("date:"+ myArray[1]);
        console.log("time:"+ myArray[2]);
        console.log("place:"+ myArray[3]);
        console.log("Home:"+ myArray[4]);
        console.log("Away:"+ myArray[5]);
    }
}

But i not get the Away-Team which is the Capture Group 5! My Output:

date:26/06
time:22:00
place:Curitiba
Home:Algerien
Away:undefined

I get it only when i not make an alternative expression with "|":

(\d\d\/\d\d)\s(\d\d:\d\d)\s(.+)\s\s\s(.+)\s-\s(.+)

Or when i use "[" "]" instead of the "(" and ")" for grouping the alternatives.

What is the problem? Is it a a bug in Nodejs regex-engine because it ignores the last capture group!? Or is the Regex wrong?

Best Regards Michael

1 Answer 1

1

Your problem could be just the capture group's.
This regex doesn't change your original (I don't know enough about your data)
just alters the capture groups.

edit - This works with your test data. Its the same regex but added some whitespace parts.

 #  /[^\S\r\n]*(?:\r?\n)(\d\d\/\d\d)[^\S\r\n]+(\d\d:\d\d)[^\S\r\n]*(?:\r?\n)(.+)(?:\r?\n)[^\S\r\n]*(?:\r?\n)(.+)(?:\r?\n)(?:-|(\d):(\d)[^\S\r\n]+\(\d:\d\))[^\S\r\n]*(?:\r?\n)(.+)/

 [^\S\r\n]*                         
 (?: \r? \n )                       # linebreak 
 # ---------------
 ( \d\d / \d\d )                    # (1), Date
 [^\S\r\n]+ 
 ( \d\d : \d\d )                    # (2), Time
 [^\S\r\n]* 
 (?: \r? \n )                       # linebreak 
 # ---------------
 ( .+ )                             # (3), Place
 (?: \r? \n )                       # linebreak 
 # ---------------
 [^\S\r\n]*                         # blank line
 (?: \r? \n )                       # linebreak 
 # ---------------
 ( .+ )                             # (4), Home
 (?: \r? \n )                       # linebreak 
 # ---------------
 (?:
      -                             # No score
   |                                # or,
      ( \d )                        # (5), Score home
      :                             # :
      ( \d )                        # (6), Score away
      [^\S\r\n]+ 
      \( \d : \d \)
 )
 [^\S\r\n]* 
 (?: \r? \n )                       # linebreak 
 # ---------------
 ( .+ )                             # (7), Away

Untested JS code

 var pattern = /[^\S\r\n]*(?:\r?\n)(\d\d\/\d\d)[^\S\r\n]+(\d\d:\d\d)[^\S\r\n]*(?:\r?\n)(.+)(?:\r?\n)[^\S\r\n]*(?:\r?\n)(.+)(?:\r?\n)(?:-|(\d):(\d)[^\S\r\n]+\(\d:\d\))[^\S\r\n]*(?:\r?\n)(.+)/g;
 while ((match = pattern.exec( data )) != null)
 {
      console.log( "\n" );
      console.log( "Date:  " + match[1] + "\n";
      console.log( "Time:  " + match[2] + "\n";
      console.log( "Place: " + match[3] + "\n";
      console.log( "Home:  " + match[4] + "\n";
      console.log( "Away:  " + match[7] + "\n";

      console.log( "Score: ";
      if (match[5] != null) {
          console.log( match[5] + " to " + match[6] + "\n";
      }
      else {
          console.log( "no info\n";
      }
 }

Perl test case

$/ = undef;
$str = <DATA>;

while ( $str =~ /[^\S\r\n]*(?:\r?\n)(\d\d\/\d\d)[^\S\r\n]+(\d\d:\d\d)[^\S\r\n]*(?:\r?\n)(.+)(?:\r?\n)[^\S\r\n]*(?:\r?\n)(.+)(?:\r?\n)(?:-|(\d):(\d)[^\S\r\n]+\(\d:\d\))[^\S\r\n]*(?:\r?\n)(.+)/g )
{
    print "\n";
    print "Date:  $1\n";
    print "Time:  $2\n";
    print "Place: $3\n";
    print "Home:  $4\n";
    print "Away:  $7\n";

    print "Score: ";
    if  ( defined $5 ) {
       print "$5 to $6\n";
    }
    else {
       print "no info\n";
    }
}

__DATA__

1
15/06 16:00
Brasília

Brasilien
3:0 (1:0)
Japan

2
23/06 16:00
Recife 

Uruguay
-
Tahiti

Output >>

Date:  15/06
Time:  16:00
Place: Brasflia
Home:  Brasilien
Away:  Japan
Score: 3 to 0

Date:  23/06
Time:  16:00
Place: Recife
Home:  Uruguay
Away:  Tahiti
Score: no info
Sign up to request clarification or add additional context in comments.

3 Comments

But i need this groups for getting the data!
@rubiktubik - Here you go.
Now it works!Thank you! But i not understand why this is working and the more simpler regex not!?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.