2

This is the PCRE2 regexp:

(?<=hello )(?:[^_]\w++)++

It's intended use is against strings like the following:

Hello Bob (Marius) Smith. -> Match "Bob"

Hello Bob Jr. (Joseph) White -> Match "Bob Jr."

Hello Bob Jr. IInd (Paul) Jobs -> Match "Bob Jr. IInd"

You get the point.

Essentially there is a magic word, in this case "hello", followed by a first name, followed by a second name which is always between parens. First names could be anything really. A single word, a list of words followed by punctuation, and so on. Heck, look at Elon Musks' kids' name (X Æ A-Xii) to see how weird names can get :)

Let's only assume ascii, though. Æ is not in my targets :)

I'm at a loss on how to convert this Regexp to JS, and the only viable solution I found was to use PCRE2-wasm on node which spins up a wasm virtual machine and sucks up 1gb of resources just for that. That's insane.

1
  • 1
    Javascript does not support possessive quantifiers. Commented Mar 18, 2021 at 21:03

3 Answers 3

3

This would match your cases in ECMAscript.

(?<=[Hh]ello )(?:[^_][\w.]+)+

You need to look for a capital H done by looking for [Hh] instead of h, as your testcases starts with a capital H and your + needs to be single to be used in ECMAscript. also you need to include a . with the \w since it is included in some names.

https://regex101.com/r/lkZK7w/1

-- thanks "D M" for pointing out the missing . in the testcase.

Sign up to request clarification or add additional context in comments.

3 Comments

I don't believe this matches case 2 or case 3. Expected Bob Jr. and got Bob Jr. Expected Bob Jr. IINd and got Bob Jr. proof
Actually that's a bug with the PCRE regexp as well. Seems it doesnt match the period and after the period. argh :D
@DM you are perfectly right, thanks for pointing that out. @DLeonardi, i have updated the answer to include . in the name.
1

@Nils has the correct answer.

If you do need to expand your acceptable character set, you can use the following regex. Check it out. The g, m, and i flags are set.

(?<=hello ).*(?=\([^\)]*?\))
Hello Bob (Marius) Smith.
Hello Bob Jr. (Joseph) White
Hello Bob Jr. IInd (Paul) Jobs
Hello X Æ A-Xii (Not Elon) Musk
Hello Bob ()) Jr. ( (Darrell) Black
Match Number Characters Matched Text
Match 1 6-10 Bob
Match 2 32-40 Bob Jr.
Match 3 61-74 Bob Jr. IInd
Match 4 92-102 X Æ A-Xii
Match 5 124-138 Bob ()) Jr. (

The idea is pretty simple:

  1. Look behind for your keyword: (?<=hello ).
  2. Look ahead for your middle name: (?=\([^\)]*?\)) (anything inside a set of parenthesis that is not a closing parenthesis, lazily so you don't take part of the first name).
  3. Take everything between as your first name: .*.

Comments

0

The ++ does not work as Javascript does not support possessive quantifiers.

As a first name, followed by a second name which is always between parens, you might also use a capture group with a match instead of a lookbehind.

\b[Hh]ello (\w+.*?)\s*\([^()\s]+\)
  • \b[Hh]ello Match hello or Hello
  • ( Capture group 1
    • \w.*? Match 1+ word chars followed by any char as least as possible
  • ) Close group 1
  • \s*\([^()\s]*\) Match optional whitespace char followed by ( till )

Regex demo

const regex = /\b[Hh]ello (\w+.*?)\s*\([^()\s]+\)/;
["Hello Bob (Marius) Smith.",
  "Hello Bob Jr. (Joseph) White",
  "Hello Bob Jr. IInd (Paul) Jobs"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[1]);
  }
})

With the lookbehind, you might also match word characters followed by an optionally repeated capture group matching whitspace chars followed by word characters or a dot.

(?<=[Hh]ello )\w+(?:\s+[\w.]+)*

Regex demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.