0

I have a collection of strings, I need to create a regex pattern to filter out the strings that has duplicate character only appear twice.

Eg: Arrays.asList("abcdef","bababc","abbcde","abcccd","aabcdd","abcdee","ababab");

Here , I want to end up in a result of ["bababc","abbcde","aabcdd","abcdee"]

So the duplicate character can be consecutive character or intermediate character .But duplication of a character twice is given precedence over any other duplication count

Eg:"bababc" , where 'a' is repeated twice and 'b' is repeated three times , since 'a' is repeated twice it get eligible for the filtering.

I tried with different patterns mentioned

  • here this works partially only in case of intermediate character, but takes string without duplicates also
  • A variation of this here , this works partially with consecutive chars after sort the string

Can some one help me ?

8
  • yes, I need to exclude 'abcdef' from list Commented Dec 3, 2018 at 9:35
  • 3
    Why is 'bababc' in the output? 'b' has a count of 3. Does that mean that the 'a' count of 2 here takes precedence? Commented Dec 3, 2018 at 9:40
  • yes char count count of 2 takes precedence, apologies i updated the question Commented Dec 3, 2018 at 9:41
  • I can't imagine a pure regex approach here because you need to check for dupe chars before the currently checked char. Commented Dec 3, 2018 at 9:46
  • I think the second option you posted works - if you're willing to sort the string beforehand. Just have to set the count to {2} instead {2,}. But then, if you're going to sort it first then you may as well just create a function to parse it. Edit: On second thought, this doesn't work as strings with duplicates of 3 and above, but without a duplicate of 2, would still be caught Commented Dec 3, 2018 at 9:46

2 Answers 2

1

If it is java, I suggest using java to solve this problem instead of regex, it is straightforward, and you can extend it very easily in case there was new requirements:

//wordList is your string list
List<String> newList = wordList.stream()
             .filter(s -> Arrays.stream(s.split(""))                                                       
             .collect(groupingBy(identity(),ounting())).values().stream().anyMatch(c -> c == 2))
                                           .collect(Collectors.toList());

some static imports:

import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;

If we do a little test, just print out the result:

List<String> wordList = Arrays.asList("abcdef", "bababc", "abbcde", "abcccd", "aabcdd", "abcdee", "ababab");
wordList.stream()
        .filter(s -> Arrays.stream(s.split(""))
                           .collect(groupingBy(identity(), counting())).values().stream().anyMatch(c -> c == 2))
        .forEach(System.out::println);

We have:

bababc
abbcde
aabcdd
abcdee
Sign up to request clarification or add additional context in comments.

1 Comment

Or as an alternative to Arrays.stream(s.split("")) s.chars().boxed().
0

Will this regexp help?

'^[^a]*a[^a]*a[^a]*$|^[^b]*b[^b]*b[^b]*$|^[^c]*c[^c]*c[^c]*$|^[^d]*d[^d]*d[^d]*$|^[^e]*e[^e]*e[^e]*$'

Test:

$ cat abcde.txt
abcdef
bababc
abbcde
abcccd
aabcdd
abcdee
ababab

$ egrep '^[^a]*a[^a]*a[^a]*$|^[^b]*b[^b]*b[^b]*$|^[^c]*c[^c]*c[^c]*$|^[^d]*d[^d]*d[^d]*$|^[^e]*e[^e]*e[^e]*$' abcde.txt
bababc
abbcde
aabcdd
abcdee

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.