Java String replace all using regex with lookahead

Question

I am trying to get a normalized URI from the incoming HTTP Request to print in the logs. This will help us to compute stats & other data by this normalized URI.

To normalize, I'm trying to do String replace using regex on the requestURI with x for all numeric & alphanumeric strings except version (eg. v1):

String str = "/v1/profile/abc13abc/13abc/cDe12/abc-bla/text_tw/HELLO/test/random/2234";
str.replaceAll("/([a-zA-Z]*[\\d|\\-|_]+[a-zA-Z]*)|([0-9]+)","/x");

This results in

/x/profile/x/x/x/x/x/HELLO/test/random/x

I want to get the result as (do not replace v1)

/v1/profile/x/x/x/x/x/HELLO/test/random/x

I tried using skip look ahead

String.replaceAll("/(?!v1)([a-zA-Z]*[\d|\-|_]+[a-zA-Z]*)|([0-9]+)","/x");

But not helping. Any clue is appreciated.

Thanks

Please show the Input String and the expected result. And don't use code to explain what you are trying to do. Write it out. — WJS
– WJS, Commented Feb 17, 2022 at 1:25
Why are you punishing the poor soul (possibly yourself) who will have to read this code in the future, by using such an unwieldy regular expression? I would suggest a simple non-regex alternative, but honestly, I can’t tell what you’re trying to do. Do you always want to preserve the first two path components, and any subsequent components which contain no digits? — VGR
– VGR, Commented Feb 17, 2022 at 3:09

Ryszard Czech · Accepted Answer · 2022-02-17 01:30:20Z

Use

/(?:(?!v[1-4])[a-zA-Z]*[0-9_-]+[a-zA-Z]*|[0-9]+)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  /                        '/'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      v                        'v'
--------------------------------------------------------------------------------
      [1-4]                    any character of: '1' to '4'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    [a-zA-Z]*                any character of: 'a' to 'z', 'A' to 'Z'
                             (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    [0-9_-]+                 any character of: '0' to '9', '_', '-'
                             (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
    [a-zA-Z]*                any character of: 'a' to 'z', 'A' to 'Z'
                             (0 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of grouping

WJS · Accepted Answer · 2022-02-17 22:11:01Z

1

With the added explanation, here is how I would approach it.

create a list of the elements spliting on / starting with the second one.
initialize a string builder with the first element.
then simply iterate over a sublist starting with the second element. Use String.matches to determine whether to replace with an x.

List<String> pathElements = Arrays.asList(str.substring(1).split("/"));
StringBuilder sb = new StringBuilder("/" + pathElements.get(0));
for(String pe : pathElements.subList(1,pathElements.size())) { 
    sb.append("/").append(pe.matches(".*[\\d-_].*") ? "x" : pe);
}

System.out.println(sb);

prints

/v1/profile/x/x/x/x/x/HELLO/test/random/x

edited Feb 17, 2022 at 22:11

answered Feb 17, 2022 at 1:48

WJS

40.2k4 gold badges27 silver badges46 bronze badges

Comments

VGR · Accepted Answer · 2022-02-17 23:20:12Z

1

Rather than using one large regular expression that will be quite difficult for people to understand and maintain in the future (including yourself, probably), I would opt for using a few lines, which make your logic more apparent:

List<String> parts = Arrays.asList(path.split("/"));
parts.replaceAll(
    p -> !p.matches("v\\d+") && p.matches(".*[-_\\d].*") ? "x" : p);
path = String.join("/", parts);

edited Feb 17, 2022 at 23:20

answered Feb 17, 2022 at 20:36

VGR

44.9k4 gold badges52 silver badges71 bronze badges

3 Comments

WJS Over a year ago

I like your approach. But the OP said alphanumeric or numeric so you need to include, at a minimum, the - and _ characters. in your regex.

VGR Over a year ago

@WJS Are - and _ considered numeric? I assumed “numeric” meant digits.

WJS Over a year ago

It's a reasonable question. I had to look it up to be certain based on the OP's expected answer. merriam-webster.com/dictionary/alphanumeric. This is just one thing that should have been clarified in the question. Another was v1. Does that simply represent the root of the path and can be anything or is it literally v1 or v followed by a digit?

Collectives™ on Stack Overflow

Java String replace all using regex with lookahead

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related