6

How can I find a repeated pattern in a string? For example, if the input file were

AAAAAAAAA
ABABAB
ABCAB
ABAb

it would output:

A
AB
ABCAB
ABAb
4
  • What if the string is "AABB". What's the expected output? Commented Feb 22, 2014 at 22:55
  • AABB. The repeat has to go all the way from the start to end, otherwise, return the whole string. Commented Feb 22, 2014 at 22:56
  • 2
    @BelgianMyWaffle I don't think this is a duplicate. If I understood the question well, its about repeated pattern in string, not character. Commented Feb 22, 2014 at 22:58
  • If the string is AABB, then the output would be AABB Commented Feb 22, 2014 at 23:00

5 Answers 5

10

If you use regex, you only need one line:

String repeated = str.replaceAll("(.+?)\\1+", "$1");

Breaking down the regex (.+?)\1:

  • (.+?) means "at least one character, but as few as possible, captured as group 1"
  • \1 means "the same character(s) as group 1

Here's some test code:

String[] strs = {"AAAAAAAAA", "ABABAB", "ABCAB", "ABAb"};
for (String str : strs) {
    String repeated = str.replaceAll("(.+?)\\1+", "$1");
    System.out.println(repeated);
}

Output:

A
AB
ABCAB
ABAb
Sign up to request clarification or add additional context in comments.

5 Comments

@santhoshkumar depends what you mean by "works". What result do you expect from AAAAAAB?
Oops.., Sorry for not being clear. For example.., "AAAAAAB" i need output as "AAAAAAB" but this is returning "AB". How i can tweak this?
@Bohemian What would be the time complexity of this? Not sure how regex works with something like this, but I imagine multiple scans?
@TheRealFakeNews time complexity is approx O(n log n): There is only one pass, but there is some re-checking of following chars along the way as the quantity of .+? grows. If long sequences were expected to be repeated, the greedier .+ may be faster.
@Bohemian How is it you came to nlogn?
1

This outputs what you ask for - the regex can probably be improved to avoid the loop but I can't manage to fix it...

public static void main(String[] args) {
    List<String> inputs = Arrays.asList("AAAAAAAAA", "ABABAB", "ABCAB", "ABAb");
    for (String s : inputs) System.out.println(findPattern(s));
}

private static String findPattern(String s) {
    String output = s;
    String temp;
    while (true) {
        temp = output.replaceAll("(.+)\\1", "$1");
        if (temp.equals(output)) break;
        output = temp;
    }
    return output;
}

2 Comments

@Bohemian Can someone please explain me the regex part alone? ("(.+)\\1", "$1")
@santhoshkumar see my answer for an explanation of the regex
1

Written in C#, but translation should be trivial.

public static string FindPattern(string s)
{
    for (int length = 1; length <= s.Length / 2; length++)
    {
        string pattern = s.Substring(0, length);
        if(MatchesPattern(s, pattern))
        {
            return pattern;
        }
    }
    return s;
}

public static bool MatchesPattern(string s, string pattern)
{
    for (int i = 0; i < s.Length; i++)
    {
        if(!s[i].Equals(pattern[i%pattern.Length]))
        {
            return false;
        }
    }
    return true;
}

Comments

0

If you might have spaces between the repeated segment:

(.+?)(\\ ?\\1)+

Comments

0

Yon don't need reexp to find pattern. Knuth-Morris-Pratt KMP Algorithm can do this much faster.

func getPattern(s string) string {
   res := make([]int, len(s)+1)

   i := 0
   j := -1
   res[0] = -1

   var patternLength int
   for i < len(s) {
       if j == -1 || s[i] == s[j] {
           i++
           j++
           res[i] = j
           if res[i] == 0 {
               patternLength++
           } else {
               break
           }
       } else {
           j = res[j]
       }
   }
   if patternLength == len(s) {
       patternLength = 0
   }
   return s[:patternLength]
}

Unit Tests

func Test_getPattern(t *testing.T) {
    testCases := []struct {
        str1     string
        expected string
    }{
        {"AAAAAAAAA", "A"},
        {"ABCABC", "ABC"},
        {"ABABAB", "AB"},
        {"LEET", ""},
    }

    for _, tc := range testCases {
        actual := getPattern(tc.str1)
        if tc.expected != actual {
            t.Errorf("Source: s1:%s\n   Expected:%s\n   Actual:  %s",
                tc.str1,
                tc.expected,
                actual)
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.