0

I am trying to get a normalized URI from the incoming HTTP Request to print in the logs. This will help us to compute stats & other data by this normalized URI.

To normalize, I'm trying to do String replace using regex on the requestURI with x for all numeric & alphanumeric strings except version (eg. v1):

String str = "/v1/profile/abc13abc/13abc/cDe12/abc-bla/text_tw/HELLO/test/random/2234";
str.replaceAll("/([a-zA-Z]*[\\d|\\-|_]+[a-zA-Z]*)|([0-9]+)","/x");

This results in

/x/profile/x/x/x/x/x/HELLO/test/random/x

I want to get the result as (do not replace v1)

/v1/profile/x/x/x/x/x/HELLO/test/random/x

I tried using skip look ahead

String.replaceAll("/(?!v1)([a-zA-Z]*[\d|\-|_]+[a-zA-Z]*)|([0-9]+)","/x");

But not helping. Any clue is appreciated.

Thanks

8
  • 3
    Is this a "guess my original String" question? Commented Feb 17, 2022 at 1:23
  • 1
    do not replace v1-v4 - no idea what this means. Commented Feb 17, 2022 at 1:24
  • 1
    Please show the Input String and the expected result. And don't use code to explain what you are trying to do. Write it out. Commented Feb 17, 2022 at 1:25
  • 1
    Apologies. Updated the question with original String. Commented Feb 17, 2022 at 1:26
  • 1
    Why are you punishing the poor soul (possibly yourself) who will have to read this code in the future, by using such an unwieldy regular expression? I would suggest a simple non-regex alternative, but honestly, I can’t tell what you’re trying to do. Do you always want to preserve the first two path components, and any subsequent components which contain no digits? Commented Feb 17, 2022 at 3:09

3 Answers 3

3

Use

/(?:(?!v[1-4])[a-zA-Z]*[0-9_-]+[a-zA-Z]*|[0-9]+)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  /                        '/'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      v                        'v'
--------------------------------------------------------------------------------
      [1-4]                    any character of: '1' to '4'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    [a-zA-Z]*                any character of: 'a' to 'z', 'A' to 'Z'
                             (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    [0-9_-]+                 any character of: '0' to '9', '_', '-'
                             (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    [a-zA-Z]*                any character of: 'a' to 'z', 'A' to 'Z'
                             (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of grouping
Sign up to request clarification or add additional context in comments.

Comments

1

With the added explanation, here is how I would approach it.

  • create a list of the elements spliting on / starting with the second one.
  • initialize a string builder with the first element.
  • then simply iterate over a sublist starting with the second element. Use String.matches to determine whether to replace with an x.
List<String> pathElements = Arrays.asList(str.substring(1).split("/"));
StringBuilder sb = new StringBuilder("/" + pathElements.get(0));
for(String pe : pathElements.subList(1,pathElements.size())) { 
    sb.append("/").append(pe.matches(".*[\\d-_].*") ? "x" : pe);
}

System.out.println(sb);

prints

/v1/profile/x/x/x/x/x/HELLO/test/random/x

Comments

1

Rather than using one large regular expression that will be quite difficult for people to understand and maintain in the future (including yourself, probably), I would opt for using a few lines, which make your logic more apparent:

List<String> parts = Arrays.asList(path.split("/"));
parts.replaceAll(
    p -> !p.matches("v\\d+") && p.matches(".*[-_\\d].*") ? "x" : p);
path = String.join("/", parts);

3 Comments

I like your approach. But the OP said alphanumeric or numeric so you need to include, at a minimum, the - and _ characters. in your regex.
@WJS Are - and _ considered numeric? I assumed “numeric” meant digits.
It's a reasonable question. I had to look it up to be certain based on the OP's expected answer. merriam-webster.com/dictionary/alphanumeric. This is just one thing that should have been clarified in the question. Another was v1. Does that simply represent the root of the path and can be anything or is it literally v1 or v followed by a digit?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.