How can I find a repeated pattern in a string? For example, if the input file were
AAAAAAAAA
ABABAB
ABCAB
ABAb
it would output:
A
AB
ABCAB
ABAb
If you use regex, you only need one line:
String repeated = str.replaceAll("(.+?)\\1+", "$1");
Breaking down the regex (.+?)\1:
(.+?) means "at least one character, but as few as possible, captured as group 1"\1 means "the same character(s) as group 1Here's some test code:
String[] strs = {"AAAAAAAAA", "ABABAB", "ABCAB", "ABAb"};
for (String str : strs) {
String repeated = str.replaceAll("(.+?)\\1+", "$1");
System.out.println(repeated);
}
Output:
A
AB
ABCAB
ABAb
AAAAAAB?.+? grows. If long sequences were expected to be repeated, the greedier .+ may be faster.This outputs what you ask for - the regex can probably be improved to avoid the loop but I can't manage to fix it...
public static void main(String[] args) {
List<String> inputs = Arrays.asList("AAAAAAAAA", "ABABAB", "ABCAB", "ABAb");
for (String s : inputs) System.out.println(findPattern(s));
}
private static String findPattern(String s) {
String output = s;
String temp;
while (true) {
temp = output.replaceAll("(.+)\\1", "$1");
if (temp.equals(output)) break;
output = temp;
}
return output;
}
Written in C#, but translation should be trivial.
public static string FindPattern(string s)
{
for (int length = 1; length <= s.Length / 2; length++)
{
string pattern = s.Substring(0, length);
if(MatchesPattern(s, pattern))
{
return pattern;
}
}
return s;
}
public static bool MatchesPattern(string s, string pattern)
{
for (int i = 0; i < s.Length; i++)
{
if(!s[i].Equals(pattern[i%pattern.Length]))
{
return false;
}
}
return true;
}
Yon don't need reexp to find pattern. Knuth-Morris-Pratt KMP Algorithm can do this much faster.
func getPattern(s string) string {
res := make([]int, len(s)+1)
i := 0
j := -1
res[0] = -1
var patternLength int
for i < len(s) {
if j == -1 || s[i] == s[j] {
i++
j++
res[i] = j
if res[i] == 0 {
patternLength++
} else {
break
}
} else {
j = res[j]
}
}
if patternLength == len(s) {
patternLength = 0
}
return s[:patternLength]
}
Unit Tests
func Test_getPattern(t *testing.T) {
testCases := []struct {
str1 string
expected string
}{
{"AAAAAAAAA", "A"},
{"ABCABC", "ABC"},
{"ABABAB", "AB"},
{"LEET", ""},
}
for _, tc := range testCases {
actual := getPattern(tc.str1)
if tc.expected != actual {
t.Errorf("Source: s1:%s\n Expected:%s\n Actual: %s",
tc.str1,
tc.expected,
actual)
}
}
}
"AABB". What's the expected output?AABB. The repeat has to go all the way from the start to end, otherwise, return the whole string.