3

I am currently working on a tool, which helps me to analyze a constantly growing String, that can look like this: String s = "AAAAAAABBCCCDDABQ". What I want to do is to find a sequence of A's and B's, do something and then remove that sequence from the original String.

My code looks like this:

while (someBoolean){

    if(Pattern.matches("A+B+", s)) {
        //Do stuff
        //Remove the found pattern
    }

    if(Pattern.matches("C+D+", s)) {
        //Do other stuff
        //Remove the found pattern
    }

}
return s;

Also, how I could remove the three sequences, so that s just contains "Q" at the end of the calculation, without and endless loop?

6
  • 1
    Yes but you can create a copy of the string and store it in a mutable local variable. Commented Jul 21, 2017 at 0:11
  • I expressed myself unclear, sorry. I have somewhere in my code a function which periodically adds characters with "+=" at the end of my String, thats what I meant with "growing". Commented Jul 21, 2017 at 0:15
  • @schande Is there a pattern to this String or does it just add random letters? Commented Jul 21, 2017 at 0:17
  • @Tommy There are patterns, and I try to describe them with the regex syntax. I am looking for patterns like: AB AAB AAAB ABB ABBB..., which i try to describe with "A+B+". An other pattern would be "A+O+C+" Commented Jul 21, 2017 at 0:22
  • You probably should check existence of particular character when adding it to the string. Do you work somehow with the string containing multiple 'AA's ? Commented Jul 21, 2017 at 0:22

3 Answers 3

4

You should use a regex replacement loop, i.e. the methods appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb).

To find one of many patterns, use the | regex matcher, and capture each pattern separately.

You can then use group(int group) to get the matched string for each capture group (first group is group 1), which returns null if that group didn't match. For better performance, to simply check whether the group matched, use start(int group), which returns -1 if that group didn't match.

Example:

String s = "AAAAAAABBCCCDDABQ";
StringBuffer buf = new StringBuffer();
Pattern p = Pattern.compile("(A+B+)|(C+D+)");
Matcher m = p.matcher(s);
while (m.find()) {
    if (m.start(1) != -1) { // Group 1 found
        System.out.println("Found AB: " + m.group(1));
        m.appendReplacement(buf, ""); // Replace matched substring with ""
    } else if (m.start(2) != -1) { // Group 2 found
        System.out.println("Found CD: " + m.group(2));
        m.appendReplacement(buf, ""); // Replace matched substring with ""
    }
}
m.appendTail(buf);
String remain = buf.toString();
System.out.println("Remain: " + remain);

Output

Found AB: AAAAAAABB
Found CD: CCCDD
Found AB: AB
Remain: Q
Sign up to request clarification or add additional context in comments.

2 Comments

Great answer. I wish I could up vote it 3 times, because That's what it deserves.
Thanks Andreas, that helped me a lot. :)
1

This solution assumes that the string always ends in Q.

String s="AAAAAAABBCCCDDABQ";

Pattern abPattern = Pattern.compile("A+B+");
Pattern cdPattern = Pattern.compile("C+D+");


while (s.length() > 1){

    Matcher abMatcher = abPattern.matcher(s);
    if (abMatcher.find()) {
        s = abMatcher.replaceFirst("");
        //Do other stuff
    }

    Matcher cdMatcher = cdPattern.matcher(s);
    if (cdMatcher.find()) {
      s = cdMatcher.replaceFirst("");
        //Do other stuff
    }

}
System.out.println(s);

1 Comment

If you assume the String always ends in Q you can just do s = s.substring(s.length()-1); Same effect.
0

You are probably looking for something like this:

String input = "AAAAAAABBCCCDDABQ";
String result = input;
String[] chars = {"A", "B", "C", "D"}; // chars to replace

for (String ch : chars) {
    if (result.contains(ch)) {
        String pattern = "[" + ch + "]+";
        result = result.replaceAll(pattern, ch);
    }
}

System.out.println(input); //"AAAAAAABBCCCDDABQ"
System.out.println(result); //"ABCDABQ"

This basically replace sequence of each character for single one.

If you want to remove the sequence completely, just replace ch to "" in replaceAll method parameters inside if body.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.