Multiple regex for replacing characters in java

Question

I have the following string:

String str = "Klaße, STRAßE, FUß";

Using of combined regex I want to replace German ß letter to ss or SS respectively. To perform this I have:

String replaceUml = str
        .replaceAll("ß", "ss")
        .replaceAll("A-Z|ss$", "SS")
        .replaceAll("^(?=^A-Z)(?=.*A-Z$)(?=.*ss).*$", "SS");

Expected result:

Klasse, STRASSE, FUSS

Actual result:

Klasse, STRAssE, FUSS

Where I'm wrong?

Not sure I understand what you think your expressions do. The first one replaces the ß with ss, the second one takes ss at the end of a word (or the string A-Z) and replaces it with SS (which is how FUSS happens to be right), but I cannot figure out what you think the third one is supposed to do... Can you clarify? — Floris
– Floris, Commented Aug 20, 2013 at 15:31
@Floris The third one is suppose to find string which starts with Uppercases and ends also with Uppercases and have ss inbetween. If all mentioned expressions are true then replace the ss to SS — bofanda
– bofanda, Commented Aug 20, 2013 at 15:51
If I understand what you are trying to do correctly then you are trying to replace with small ss if the string contains only lowercase characters and to upper SS if it is all uppercase, the third replace I still can't figure out what you are trying to achieve and you are probably confusing character class ranges with literal character matching. — Ibrahim Najjar
– Ibrahim Najjar, Commented Aug 20, 2013 at 15:54

ajb · Accepted Answer · 2013-08-20 15:55:37Z

4

First of all, if you're trying to match some character in the range A-Z, you need to put it in square brackets. This

.replaceAll("A-Z|ss$", "SS")

will look for the three characters A-Z in the source, which isn't what you want. Second, I think you're confused about what | means. If you say this:

.replaceAll("[A-Z]|ss$", "SS")

it will replace any upper-case letter at the end of the word with SS, because | means look for this or that.

A third problem with your approach is that the second and third replaceAll's will look for any ss that was in the original string, even if it didn't come from a ß. This may or may not be what you want.

Here's what I'd do:

String replaceUml = str
    .replaceAll("(?<=[A-Z])ß", "SS")
    .replaceAll("ß", "ss");

This will first replace all ß by SS if the character before the ß is an upper-case letter; then if there are any ß's left over, they get replaced by ss. Actually, this won't work if the character before ß is an umlaut like Ä, so you probably should change this to

String replaceUml = str
    .replaceAll("(?<=[A-ZÄÖÜ])ß", "SS")
    .replaceAll("ß", "ss");

(There may be a better way to specify an "upper-case Unicode letter"; I'll look for it.)

EDIT:

String replaceUml = str
    .replaceAll("(?<=\\p{Lu})ß", "SS")
    .replaceAll("ß", "ss");

A problem is that it won't work if ß is the second character in the text, and the first letter of the word is upper-cased but the rest of the word isn't. In that case you probably want lower-case "ss".

String replaceUml = str
    .replaceAll("(?<=\\b\\p{Lu})ß(?=\\P{Lu})", "ss")
    .replaceAll("(?<=\\p{Lu})ß", "SS")
    .replaceAll("ß", "ss");

Now the first one will replace ß by ss if it's preceded by an upper-case letter that is the first letter of the word but followed by a character that isn't an upper-case letter. \P{Lu} with an upper-case P will match any character other than an upper-case letter (it's the negative of \p{Lu} with a lower-case p). I also included \b to test for the first character of a word.

edited Aug 20, 2013 at 15:55

answered Aug 20, 2013 at 15:40

ajb

31.8k4 gold badges63 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bohemian Over a year ago

+1 but I think you need a look ahead too, in case of Aßbc you want Assbc (don't know if German words exist with ß second letter, but from a pattern aspect, it's an edge case)

ajb Over a year ago

Yep, I was thinking about it at the same time you were. And I suspect that words like that do exist where the ß is followed by "t" or "p", but I don't remember for sure, and just to make things interesting the German government recently changed the rules for when you're supposed to use ß, so whatever I learned in high-school German is wrong now.

Joop Eggen · Accepted Answer · 2013-08-20 15:45:07Z

2

String replaceUml = str
    .replaceAll("(?<=\\p{Lu})ß", "SS")
    .replace("ß", "ss")

This uses regex with a preceding unicode upper case letter ("SÜß"), to have capital "SS".

The (?<= ... ) is a look-behind, a kind of context matching. You could also do

    .replaceAll("(\\p{Lu})ß", "$1SS")

as ß will not occure at the beginning.

Your main trouble was not using brackets [A-Z].

edited Aug 20, 2013 at 15:45

answered Aug 20, 2013 at 15:38

Joop Eggen

110k8 gold badges89 silver badges142 bronze badges

Comments

Community · Accepted Answer · 2017-02-08 14:43:45Z

0

Breaking your regex into parts:

Regex 101 Demo

Regex

/ß/g

Description

ß Literal ß
g modifier: global. All matches (don't return on first match)

Visualization

Regular expression visualization

Regex 101 Demo

Regex

/([A-Z])ss$/g

Description

1st Capturing group ([A-Z]) 
    Char class [A-Z]  matches:
        A-Z A character range between Literal A and Literal Z
ss Literal ss
$ End of string
g modifier: global. All matches (don't return on first match)

Visualization

Regular expression visualization

Regex 101 Demo

Regex

/([A-Z]+)ss([A-Z]+)/g

Description

1st Capturing group ([A-Z]+) 
    Char class [A-Z] 1 to infinite times [greedy] matches:
        A-Z A character range between Literal A and Literal Z
ss Literal ss
2nd Capturing group ([A-Z]+) 
    Char class [A-Z] 1 to infinite times [greedy] matches:
        A-Z A character range between Literal A and Literal Z
g modifier: global. All matches (don't return on first match)

Visualization

Regular expression visualization

Specifically for you

String replaceUml = str
    .replaceAll("ß", "ss")
    .replaceAll("([A-Z])ss$", "$1SS")
    .replaceAll("([A-Z]+)ss([A-Z]+)", "$1SS$2");

edited Feb 8, 2017 at 14:43

CommunityBot

11 silver badge

answered Aug 20, 2013 at 15:38

abc123

19k7 gold badges55 silver badges84 bronze badges

5 Comments

ajb Over a year ago

Ummm, isn't that going to delete the character before the ss?

abc123 Over a year ago

yeah if you click the links i used the capture groups just didn't in the code for the solution editting

ajb Over a year ago

you need to double-backslash \1 and \2

Bohemian Over a year ago

This java, not JavaScript

ajb Over a year ago

Yes, I didn't pick up on that earlier. (Also, it isn't Perl.) It should be $1 and $2. \\1 and \\2 don't work.

Ankur Shanbhag · Accepted Answer · 2013-08-20 15:38:25Z

-1

Use String.replaceFirst() instead of String.replaceAll().

replaceAll("ß", "ss")

This will replace all the occurrences of "ß". Hence the output after this statement becomes something like this :

Klasse, STRAssE, FUss

Now replaceAll("A-Z|ss$", "SS") replaces the last occurrence of "ss" with "SS", hence your final result looks like this :

Klasse, STRAssE, FUSS

To get your expected result try this out :

String replaceUml = str.replaceFirst("ß", "ss").replaceAll("ß", "SS");

edited Aug 20, 2013 at 15:38

answered Aug 20, 2013 at 15:29

Ankur Shanbhag

7,8042 gold badges31 silver badges38 bronze badges

4 Comments

bofanda Over a year ago

With your suggestion I have now following result: Klasse, STRAssE, FUss

Ankur Shanbhag Over a year ago

try using replaceFirst(). It will help. :-)

bofanda Over a year ago

following your answer we have result: Klasse, STRAssE, FUSS, but I want Klasse, STRASSE, FUSS

Ankur Shanbhag Over a year ago

see the last part of my answer. I have pasted the code to get your expected result. Hope this will help.

Collectives™ on Stack Overflow

Multiple regex for replacing characters in java

4 Answers 4

2 Comments

Comments

Regex 101 Demo

Regex 101 Demo

Regex 101 Demo

Specifically for you

5 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Specifically for you

5 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related