In the following POS tagged sentence (and similar sentences) what regular expression to use in order to capture only two-word noun noun compounds (i.e. \p{Alnum}+_NN[PS]? \p{Alnum}[PS]?) and avoid capturing two-word matches that are part of larger phrases.
I_PRP will_MD never_RB go_VB to_IN sun_NN devil_NN auto_NN again_RB but_CC my_PRP$ family_NN members_NNS will_MD ._.
In particular I would like to capture family_NN members_NN but not sun_NN devil_NN and devil_NN auto_NN.
Currently I use the following regex with positive lookahead:
"(?=\\b([\\p{Alnum}]+)_(NN[SP]?)\\s([\\p{Alnum}]+)_(NN[SP]?)\\b)."
The problem is in addition to family_NN members_NNS it captures sun_NN devil_NN, devil_NN auto_NN.