4

With this pattern:

(how is\s)?(the\s)?(weather)\s?((on)\s)?(today|tomorrow|sunday|monday|tuesday|wednesday|thursday|friday|saturday|sunday|this week)?(\s(in)\s(.*)\s?(on)?\s?(today|tomorrow|sunday|monday|tuesday|wednesday|thursday|friday|saturday|sunday|this week)?)?

This is what I'm trying to capture

Input : how is the weather on tuesday in vienna

output :

array(10
0   =>  how is the weather on tuesday in vienna
1   =>  how is 
2   =>  the 
3   =>  weather
4   =>  on 
5   =>  on
6   =>  tuesday
7   =>   in vienna
8   =>  in
9   =>  vienna
)

Here, I can extract day and location from array[6] and array[9]

Input : how is the weather in vienna on tuesday

output :

array(10
0   =>  how is the weather in vienna on tuesday
1   =>  how is 
2   =>  the 
3   =>  weather
4   =>  
5   =>  
6   =>  
7   =>  in vienna on tuesday
8   =>  in
9   =>  vienna on tuesday
)

But here, the location and day are captured as a whole in array[9]. I want it to capture day and location in different elements. Is there anything wrong with the grouping in regex pattern?

4
  • I didn't understand your question can you give some sample data and your expected output? Commented May 30, 2016 at 2:42
  • i don't understand what you are trying to achieve but I did a bit of modification and it works giving the output..writing too long regex with so many optional groups is not too good Commented May 30, 2016 at 2:46
  • I edited the question Commented May 30, 2016 at 2:57
  • There isn't anything wrong with capturing timeframe and location in a grouping regex, but it would seem more valuable to capture the timeframe in one capture group, and the location in a separate capture group. This way the values can be used immediately in your program. Commented May 30, 2016 at 3:02

2 Answers 2

1

Description

I recommend using optional lookaheads to seek out and find the location or timeframe if they exist.

^(?=(?:.*?on\s(today|tomorrow|sunday|monday|tuesday|wednesday|thursday|friday|saturday|sunday|this week))?)(?=(?:.*?in\s([a-z]+))?)

Regular expression visualization

This regular expression will do the following:

  • capture group 1 always gets the timeframe if it exists in the string
  • capture group 2 always gets the location if it exists in the string
  • allows the location and timeframe to appear in any order in the string

Example

Live Demo

https://regex101.com/r/rN9hG2/1

Sample text

weather on sunday
weather on sunday in vienna
weather in vienna
weather in vienna on sunday

Sample Matches

[0][1] = sunday
[0][2] = 

[1][1] = sunday
[1][2] = vienna

[2][1] = 
[2][2] = vienna

[3][1] = sunday
[3][2] = vienna

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      .*?                      any character except \n (0 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
      on                       'on'
----------------------------------------------------------------------
      \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
      (                        group and capture to \1:
----------------------------------------------------------------------
        today                    'today'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tomorrow                 'tomorrow'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        sunday                   'sunday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        monday                   'monday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        tuesday                  'tuesday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        wednesday                'wednesday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        thursday                 'thursday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        friday                   'friday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        saturday                 'saturday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        sunday                   'sunday'
----------------------------------------------------------------------
       |                        OR
----------------------------------------------------------------------
        this week                'this week'
----------------------------------------------------------------------
      )                        end of \1
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      .*?                      any character except \n (0 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
      in                       'in'
----------------------------------------------------------------------
      \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
      (                        group and capture to \2:
----------------------------------------------------------------------
        [a-z]+                   any character of: 'a' to 'z' (1 or
                                 more times (matching the most amount
                                 possible))
----------------------------------------------------------------------
      )                        end of \2
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

10 Comments

This works perfectly. I'm still trying to understand the pattern though. Thank you so much
Think of it like this the (?=.*?......) plants its feet and uses binoculars to look ahead into the string to see if the string contains the desired text.
Capture group zero is always populated. When you added the (?:....) you are asking the regular expression engine to match a value which if it exists will be returned. I see that you added ? to the end of them but this only means the string is optional.
Of course this would fail to match the string In vienna on saturday what will the weather be like. So I'd just add (?=(?:.*?weather)?) which would look forward into the string to verify the word weather is contained in the string somewhere. This way the word weather, the time period, and location can appear in any order.
The regular expression engine will attempt to step through your source string one character at a time, then when it encounters a lookahead, the engine looks forward to see if the following string matches a pattern. This page has a much better explanation that I can provide here.
|
1

Capture all words

If I understood your question well, to capture those words then you can use a regex like this:

(\w+)\s+(\w+)\s+(\w+)(?:\s+(\w+)\s(\w+))?

Regular expression visualization

Regex demo

enter image description here

Match information

MATCH 1
1.  [3-10]  `weather`
2.  [11-13] `on`
3.  [14-20] `sunday`
MATCH 2
1.  [25-32] `weather`
2.  [33-35] `on`
3.  [36-42] `sunday`
4.  [43-45] `in`
5.  [46-52] `vienna`
MATCH 3
1.  [57-64] `weather`
2.  [65-67] `in`
3.  [68-74] `vienna`
MATCH 4
1.  [79-86] `weather`
2.  [87-89] `in`
3.  [90-96] `vienna`
4.  [97-99] `on`
5.  [100-106]   `sunday`

Capture only your words in bold

On the other hand, if you want to capture your words in bold, then you can remove some capturing groups like below regex:

\w+\s+\w+\s+(\w+)(?:\s+\w+\s(\w+))?

Regular expression visualization

Regex demo

Match information

MATCH 1
1.  [14-20] `sunday`
MATCH 2
1.  [36-42] `sunday`
2.  [46-52] `vienna`
MATCH 3
1.  [68-74] `vienna`
MATCH 4
1.  [90-96] `vienna`
2.  [100-106]   `sunday`

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.