Regex Pattern To validate count of duplicate character

Question

I have a collection of strings, I need to create a regex pattern to filter out the strings that has duplicate character only appear twice.

Eg: Arrays.asList("abcdef","bababc","abbcde","abcccd","aabcdd","abcdee","ababab");

Here , I want to end up in a result of ["bababc","abbcde","aabcdd","abcdee"]

So the duplicate character can be consecutive character or intermediate character .But duplication of a character twice is given precedence over any other duplication count

Eg:"bababc" , where 'a' is repeated twice and 'b' is repeated three times , since 'a' is repeated twice it get eligible for the filtering.

I tried with different patterns mentioned

here this works partially only in case of intermediate character, but takes string without duplicates also
A variation of this here , this works partially with consecutive chars after sort the string

Can some one help me ?

Why is 'bababc' in the output? 'b' has a count of 3. Does that mean that the 'a' count of 2 here takes precedence? — kerwei
– kerwei, Commented Dec 3, 2018 at 9:40
yes char count count of 2 takes precedence, apologies i updated the question — edwin
– edwin, Commented Dec 3, 2018 at 9:41
I can't imagine a pure regex approach here because you need to check for dupe chars before the currently checked char. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Dec 3, 2018 at 9:46
I think the second option you posted works - if you're willing to sort the string beforehand. Just have to set the count to {2} instead {2,}. But then, if you're going to sort it first then you may as well just create a function to parse it. Edit: On second thought, this doesn't work as strings with duplicates of 3 and above, but without a duplicate of 2, would still be caught — kerwei
– kerwei, Commented Dec 3, 2018 at 9:46

Kent · Accepted Answer · 2018-12-03 09:58:22Z

1

If it is java, I suggest using java to solve this problem instead of regex, it is straightforward, and you can extend it very easily in case there was new requirements:

//wordList is your string list
List<String> newList = wordList.stream()
             .filter(s -> Arrays.stream(s.split(""))                                                       
             .collect(groupingBy(identity(),ounting())).values().stream().anyMatch(c -> c == 2))
                                           .collect(Collectors.toList());

some static imports:

import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;

If we do a little test, just print out the result:

List<String> wordList = Arrays.asList("abcdef", "bababc", "abbcde", "abcccd", "aabcdd", "abcdee", "ababab");
wordList.stream()
        .filter(s -> Arrays.stream(s.split(""))
                           .collect(groupingBy(identity(), counting())).values().stream().anyMatch(c -> c == 2))
        .forEach(System.out::println);

We have:

bababc
abbcde
aabcdd
abcdee

answered Dec 3, 2018 at 9:58

Kent

197k36 gold badges248 silver badges317 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

LuCio Over a year ago

Or as an alternative to Arrays.stream(s.split("")) s.chars().boxed().

Kubator · Accepted Answer · 2018-12-03 09:56:48Z

0

Will this regexp help?

'^[^a]*a[^a]*a[^a]*$|^[^b]*b[^b]*b[^b]*$|^[^c]*c[^c]*c[^c]*$|^[^d]*d[^d]*d[^d]*$|^[^e]*e[^e]*e[^e]*$'

Test:

$ cat abcde.txt
abcdef
bababc
abbcde
abcccd
aabcdd
abcdee
ababab

$ egrep '^[^a]*a[^a]*a[^a]*$|^[^b]*b[^b]*b[^b]*$|^[^c]*c[^c]*c[^c]*$|^[^d]*d[^d]*d[^d]*$|^[^e]*e[^e]*e[^e]*$' abcde.txt
bababc
abbcde
aabcdd
abcdee

answered Dec 3, 2018 at 9:56

Kubator

1,4137 silver badges13 bronze badges

Collectives™ on Stack Overflow

Regex Pattern To validate count of duplicate character

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related