1

I am trying to extract JPA named parameters in Javasacript. And this is the algorithm that I can think of

const notStrRegex = /(?<![\S"'])([^"'\s]+)(?![\S"'])/gm
const namedParamCharsRegex = /[a-zA-Z0-9_]/;

/**
 * @returns array of named parameters which,
 * 1. always begins with :
 * 2. the remaining characters is guranteed to be following {@link namedParamCharsRegex}
 *
 * @example
 * 1. "select * from a where id = :myId3;" -> [':myId3']
 * 2. "to_timestamp_tz(:FROM_DATE, 'YYYY-MM-DD\"T\"HH24:MI:SS')" -> [':FROM_DATE']
 * 3. "TO_CHAR(ep.CHANGEDT,'yyyy=mm-dd hh24:mi:ss')" -> []
 */
export function extractNamedParam(query: string): string[] {
  return (query.match(notStrRegex) ?? [])
    .filter((word) => word.includes(':'))
    .map((splittedWord) => splittedWord.substring(splittedWord.indexOf(':')))
    .filter((splittedWord) => splittedWord.length > 1) // ignore ":"
    .map((word) => {
      // i starts from 1 because word[0] is :
      for (let i = 1; i < word.length; i++) {
        const isAlphaNum = namedParamCharsRegex.test(word[i]);
        if (!isAlphaNum) return word.substring(0, i);
      }
      return word;
    });
}

I got inspired by the solution in https://stackoverflow.com/a/11324894/12924700 to filter out all characters that are enclosed in single/double quotes.

While the code above fulfilled the 3 use cases above. But when a user input

const testStr  = '"user input invalid string \' :shouldIgnoreThisNamedParam \' in a string"'
extractNamedParam(testStr) // should return [] but it returns [":shouldIgnoreThisNamedParam"] instead

I did visit the source code of hibernate to see how named parameters are extracted there, but I couldn't find the algorithm that is doing the work. Please help.

1 Answer 1

1

You can use

/"[^\\"]*(?:\\[\w\W][^\\"]*)*"|'[^\\']*(?:\\[\w\W][^\\']*)*'|(:\w+)/g

Get the Group 1 values only. See the regex demo. The regex matches strings between single/double quotes and captures : + one or more word chars in all other contexts.

See the JavaScript demo:

const re = /"[^\\"]*(?:\\[\w\W][^\\"]*)*"|'[^\\']*(?:\\[\w\W][^\\']*)*'|(:\w+)/g;
const text = "to_timestamp_tz(:FROM_DATE, 'YYYY-MM-DD\"T\"HH24:MI:SS')";
let matches=[], m;
while (m=re.exec(text)) {
  if (m[1]) {
    matches.push(m[1]);
  }
}
console.log(matches);

Details:

  • "[^\\"]*(?:\\[\w\W][^\\"]*)*" - a ", then zero or more chars other than " and \ ([^"\\]*), and then zero or more repetitions of any escaped char (\\[\w\W]) followed with zero or more chars other than " and \, and then a "
  • | - or
  • '[^\\']*(?:\\[\w\W][^\\']*)*' - a ', then zero or more chars other than ' and \ ([^'\\]*), and then zero or more repetitions of any escaped char (\\[\w\W]) followed with zero or more chars other than ' and \, and then a '
  • | - or
  • (:\w+) - Group 1 (this is the value we need to get, the rest is just used to consume some text where matches must be ignored): a colon and one or more word chars.
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer. It works like a charm. However, I would wish to understand how this regex works. With the help of explanation in regex101 and mdn , I manage to understand that there are 3 alternatives here and we are only interested in the parenthesized substring match. Can you please explain how alternative1/alternative2 regex works? i.e. "[^\\"]*(?:\\[\w\W][^\\"]*)*"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.