2

I want to extract proper nouns (e.g Micheal Jackson) from a text with PHP regex but my regex is not right.

The text:

My friend Micheal Jackson was the King of Pop. The Game Album sold little.

What I want:

A regex that is able to extract proper nouns of multiple words e.g Micheal Jackson or The Game Album.

My regex:

/(?<=\s)([A-Z][a-z]+).*(?=\s)/

Thanks.

P.S. Posted via a mobile device. Apologies if format is poor.

2
  • Regex doesn't know what a proper name is. How do you define/find them? Commented Sep 30, 2011 at 15:51
  • Michael. Just in case spelling matters to your algorithm. Commented Aug 12, 2017 at 21:58

2 Answers 2

3

Try to use word boundaries instead of your lookbehind/lookahead

/\b([A-Z][a-z]+)\b/

I don't understand your .* part this will match anything after the first word till the last whitespace, so I removed it from my regex.

If you want to match multiple words at once (Maybe you wanted to achieve this with your .*?) try this:

(?:\s*\b([A-Z][a-z]+)\b)+

See it here on Regexr

Sign up to request clarification or add additional context in comments.

3 Comments

+1 for the succinct (?:\s*\b([A-Z][a-z]+)\b)+ regex. What about proper nouns that have capitals in the middle of a single word (for example, a company name like 'CompuServe')? Maybe you should consider using (?:\s*\b([A-Z][A-Za-z]+)\b)+ instead.
For proper nouns like "iPhone" that start with a lowercase letter, but have a capital in them I use: (?:\s*\b([a-z]*[A-Z][A-Za-z]+)\b)+
FYI: That'll will match white space preceeding whitespace before a single capitalized word.
1

The Stanford Parser can help you here. It will tokenize your phrase and extract proper nouns and all other pieces according to the sentence structure.

It's available as a jar download or you can try it out online here: http://nlp.stanford.edu:8080/parser/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.