0

I want to surround all tokens in a text with tags in the following manner:

Input: " abc fg asd "

Output:" <token>abc</token> <token>fg</token> <token>asd</token> "

This is the code I tried so far:

String regex = "(\\s)([a-zA-Z]+)(\\s)";
String text = " abc fg      asd ";
text = text.replaceAll(regex, "$1<token>$2</token>$3");
System.out.println(text);

Output:" <token>abc</token> fg <token>asd</token> "

Note: for simplicity we can assume that the input starts and ends with whitespaces

3 Answers 3

2

Use lookaround:

String regex = "(?<=\\s)([a-zA-Z]+)(?=\\s)";
...
text = text.replaceAll(regex, "<token>$1</token>");
Sign up to request clarification or add additional context in comments.

1 Comment

And text = text.replaceAll(regex, "<token>$1</token>"); In fact why not text = text.replaceAll("\\w+", "<token>$0</token>");?
0

If your tokens are only defined with a character class you don't need to describe what characters are around. So this should suffice since the regex engine walks from left to right and since the quantifier is greedy:

String regex = "[a-zA-Z]+";
text = text.replaceAll(regex, "<token>$0</token>");

Comments

0
                                 // meaning not a space, 1+ times
String result = input.replaceAll("([^\\s]+)", "<token>$1</token>");

this matches everything that isn't a space. Prolly the best fit for what you need. Also it's greedy meaning it will never leave out a character that it shouldn't ( it will never find the string "as" in the string "asd" when there is another character with which it matches)

1 Comment

\\S==[^\\s]. Also you don't need to wrap entire regex with parenthesis to create group matching entire regex because group 0 already does it for you. So try replaceAll("\\S+", "<token>$0</token>");

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.