1

how can I extract proper nouns / numeric values from a string using PHP or JavaScript? example theres a string like

Xyz visited this page this page 53 mins ago.

I want to be able to recognize "Xyz" and "53" as proper noun and numeric respectively

1
  • 1
    There is no easy way to do this. You would have to look into the broad field of "Natural Language Processing/Recognition" Commented Jun 26, 2009 at 9:39

5 Answers 5

1

The one obvious way is to have a dictionary of proper knowns and some good indexing to quickly search through that, if such a thing exists.

But I get the feeling you are looking for a way to grammatically infer that a word is a proper noun.

I can't think of any perfect way to do this, but if you created a series of rules, you could use these to parse a passage.

Rules might include. * Words that end with ly are not a proper noun * Noise words such as and, to , but etc. are not proper nouns * words that have capital letters but don't start a sentence are proper nouns

To improve it you could use these rules to create a dictionary of proper nouns. Every time a word follows one of these rules it either gets added to or deleted form the proper nouns dictionary.

This is very rough - if this is on the right track, then perhas I can be more specific.

Sign up to request clarification or add additional context in comments.

2 Comments

I was hoping to achieve this with regex or soemthing eg. /([^.])(\s)+([A-Z]{1}[a-z]+)/ But this regular expression dosent match two consequetive proper nouns...eg "name is Abb Bayer"....
theres no simple way to achieve this..I didnot get to resolving this..But still thinking about it.I accept that one needs to do a lot to do this
0

If it's always one proper noun in the sentence then you could find it by looking for the word beginning with a capital letter. And if there is none except the first word then that it is. Problem arises if Xyz is named Bim de Verdier or if it's not actually capitalized.

// Get the number with JavaScript and RegExp
var regex = new RegExp("\d+");
var match = regex.exec("Xyz visisted this page this page 53 mins ago.");
if (match == null) {
  alert("No match");
} else {
  var s = "";
  for (i = 0; i < match.length; i++) {
    s = s + match[i] + "\n";
  }
  alert(s);
}

A capitalized word can be matched with "[A-Z][a-z]+[ ]".

Comments

0

The PHP functions is_numeric and ucfirst may help recognize the words:

function parse_name_and_number($sentence) {
    $words = explode(' ', $sentence);
    $name = array();
    foreach ($words as $word) {
        if (is_numeric($word))
            $number = $word;
        elseif ($word == ucfirst($word))
            $name[] = $word;
    }
    $name = implode(' ', $name);
    return array('name' => $name, 'number' => $number);
}

print_r(parse_name_and_number('Xyz visited this page 53 minutes ago'));
// output:  Array ( [name] => Xyz [number] => 53 )

print_r(parse_name_and_number('we thought Bim de Verdier visited the page 5 seconds ago'));
// output:  Array ( [name] => Bim Verdier [number] => 5 )

print_r(parse_name_and_number('Weirder input messes up the results'));
// output:  Array ( [name] => Weirder [number] => )

Comments

0

Best option is to use link grammar. Parse the sentence and extract proper nouns.

www.link.cs.cmu.edu/link

Comments

0
Xyz visisted this page this page 53 mins ago.

Now, just get the position of "visisted this page" or whatever, and that is your length from the beginning of the sentance. If, for instance, "Person " is always at the beginning, then just set the starting point to 7 and subtract 7 from the first number. Here's a quick JS example:

alert(str.substr(7, str.IndexOf("visited") - 7));

Which should return "Xyz". Hope that helps. Of course, this only works if you know the structure of your sentence, which would be the case in the example given.

P.S. I know I'm two years late, but this might help someone in the future.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.