1
\$\begingroup\$

Does the following look like acceptable code to replace some escaped regex characters?

  export function parseString(s: string): string {
    let r : string;
    if ( s[0] === 'r' || s[0] === 'R' ) {
      r = s.substring(1); // no escape
    } else {
      r = s.substring(1, s.length - 1);
      r= r.replace('\\n', '\n');
      r= r.replace('\\r', '\r');
      r= r.replace('\"', '"');
      r= r.replace('\'', '\'');
      r= r.replace('\\a', '\a');
      r= r.replace('\\b', '\b');
      r= r.replace('\\r', '\r');
      r= r.replace('\\t', '\v');
      r= r.replace('\\\\', '\\');
      r= r.replace('\\?', '?');
      r= r.replace('\\`', '\`');
      const reg = /\\x../gi;
      const regReseult = r.matchAll(reg);
      let iter = regReseult.next();
      while (!iter.done) {
        r=r.replace(iter.value[0], escape_hex(iter.value[0]));
        iter = regReseult.next();
      }
    }

    return r;
  }

  function escape_hex( s: string ) : string {
    let ss= s.toLowerCase();
    if ( ss[0]!== '\\' || ss[1]!== 'x') {
      throw new Error('Invalid hexadecimal escape sequence');
    }
    if ( ss.length !== 4) {
      throw new Error('Invalid hexadecimal escape sequence');
    }
    const reg = /[0-9a-f][0-9a-f]/i;
    if ( ss.match(reg) === null ) {
      throw new Error('Invalid hexadecimal escape sequence');
    }
    ss= ss.replace('\\x', '0x');

    return String.fromCharCode(_parseInt(ss));
  }

Or, would a better pattern be to capture the various cases as a single regex such as \\[nr"'abrt\\?`] and then do a substitution based on the capture char? What might be the cleanest way to write the above function?


The goal of the above function is to parse a literal string as defined in BigQuery string-literals, which may accept a string in various forms such as "hello", 'hello', and r'hello\nthere'.

\$\endgroup\$
3
  • \$\begingroup\$ I'm not sure I understand the specifcation clearly. Can you share a few input/outputs? Why are we taking substrings? Is this browser or Node code, or environment agnostic? \$\endgroup\$ Commented Sep 30, 2022 at 17:24
  • \$\begingroup\$ @ggorlen I added some details here. This is a function that receives a token from a lexer and should normalize it to a "normal string". It should be env-agnostic, but for now I'm running it in node. \$\endgroup\$ Commented Sep 30, 2022 at 18:02
  • \$\begingroup\$ replace only replaces the first occurrence of the pattern if the pattern is a string. Are you sure this is working as you'd expect? \$\endgroup\$ Commented Oct 13, 2022 at 7:37

1 Answer 1

2
\$\begingroup\$

I think you can refactor code this way to avoid these many lines of replace statements like this,

const replaceWith = {
    '\\n': '\n',
    '\\r': '\r',
    '\"' : '\''
}; // you can always add more replacement here

for (const key in replaceWith) {
   r = r.replace(key, replaceWith[key]);
}
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.