I'm trying to fetch below patterns from the text using RegEx:
John Doe
JOHN DOE
Sam John Watson
Sam John Lilly Watson
SAM JOHN WATSON
SAM JOHN LILLY WATSON
Input Data only contains single line and I need to find above patterns in that.
More about Pattern
- Each word will start with a Uppercase letter and followed by either Upper or Lowercase
- Minimum 2 words
- Maximum 4 words
- Words will include only A-Z or a-z chars
What I Tried:
import re
re.findall("[A-Z][A-Za-z]+ [A-Z][A-Za-z]+ [A-Za-z]* [A-Za-z]*", text)
Which will correctly identifies input like:
Sam Peters John Doe
SAM WINCH DAN BROWN
but fails on input with less than 4 words.