1

I want to print duplicate characters from a string using collections(Set) only.

I have written code but it will show the correct result if String is "ashish" but fails if the String is "ashish java" because of occurrence of char 'a' is three times.

public class DuplicateStringMethod {
    public static void duplicateString(String str) {
        char[] cArray = str.toCharArray();
        Set<Character> set = new HashSet<Character>();

        for(char c:cArray) {
            if(set.add(c)==false) {
                System.out.println(c);
            }
        }
    }

    public static void main(String[] args) {
        duplicateString("Java ashishj ");
    }
}

It will print a a s h. But I want a s h using only Set interface.

2
  • 1
    a. Make use of Map and count the occurrence. b. Use another set where data is populated for the first time and you look into only when there is no character in other set c. Make use of char array O(1) space and O(1) time complexity Commented Sep 24, 2019 at 13:43
  • 1
    if(set.add(c)==false) should be written as if(!set.add(c)); just like if(set.add(c)==true) should be if(set.add(c)). Commented Sep 24, 2019 at 14:28

7 Answers 7

1

Use one more set to store the duplicate element and print the element. Try like this:

public static void duplicateString(String str) {
        str=str.replaceAll(" ","");
        char[] cArray = str.toCharArray();
        Set<Character> set = new HashSet<Character>();
        Set<Character> set1 = new HashSet<Character>();
        for(char c:cArray) {
            if(set.add(c)==false) {
                if(set1.add(c) == true)
                    System.out.println(c);
            }
        }
    }
Sign up to request clarification or add additional context in comments.

Comments

1

Check this program

public static void duplicateString(String str) {

        char[] cArray = str.replaceAll("\\s+", "").toCharArray();

        Set<Character> set = new HashSet<Character>();
        Set<Character> alreadyExistingSet = new HashSet<Character>();

        for (char c : cArray) {
            if (set.add(c) == false && alreadyExistingSet.add(c) == true) {
                System.out.print(c);
            }
        }
    }

2 Comments

Can you explain me this line of code: if (set.add(c) == false && alreadyExistingSet.add(c) == true) { System.out.print(c); So at first, 'a' will be taken and check condition set.add(a) == false which is not satisfying and alreadyExistingSet.add(c) == true which is satisfying so it will check one by one all characters from string "Java ashishj" but, what will happen when it checks character 's' which is second time.
Don’t use …== false or …== true. For the first, use !, for the second, just remove the obsolete == true.
1

All you need to do is use the add() method of the Set class to tell you if the thing being inserted (added) is already part of the set. When the function returns false it means the current "thing" being added is a duplicate. Then, add that to a Set of duplicates. This way, items duplicated more than once, will only show once in the new set. Lastly, to conserve the order, you may want to use a LinkedHashSet to store duplicates.

public class TestDups {

    public static void main (String[] args) {
        String str = "Java ashishj ";
        Set<Byte> myset = new TreeSet<>();
        Set<Character> dups = new LinkedHashSet<>();
        for (byte c: str.getBytes() ) {
            if (!myset.add(c)) {
                dups.add((char)c);
            }
        }

        dups.stream().forEach(System.out::print);
    }
}

The output of the code above is "ash ". Notice the white space at the end since the original String contains two spaces (between words and at the end).

Comments

1

Try that:

public static void duplicateString(String str) {
    Set<Character> firstTime = new HashSet<Character>();
    Set<Character> reported = new HashSet<Character>();

    char[] cArray = str.toCharArray();
    for(char c:cArray) {
        if (!firstTime.contains(c)) {
          firstTime.add(c);
          continue;
        }
        if (reported.contains(c)) { continue; }
        reported.add(c);
        System.out.println(c);
    }
}

Thanks to Holger's suggestion I ran some tests:

  • add: 52443260ns for 10000000 operations
  • contains: 28209745ns for 10000000 operations

Therefore the code above although not shorter is fastest.

5 Comments

The add method of a Set will only add when the element is not already contained in the set and returns a boolean telling whether it added the element. Therefore, contains checks are obsolete. So you can simply use for(char c: str.toCharArray()) { if(!firstTime.add(c) && reported.add(c)) System.out.println(c); }
@Holger Yes if contains() and add() are computationally equal your suggestion is totally correct. However quick test shows that add() is twice slower from the contains() and hence the code is better off as it is.
Just want understand this condition if (!firstTime.contains(c)) { firstTime.add(c); So, the set 'FirstTime is empty at first then how can you use contains functions and check .add(c)?? Please explain me
@user3035566 the ! operator means “not”.
@Deian I have strong doubts about the validity of your (undocumented) test. Since you are going to add the element anyway when not contained, only the tests for already contained elements matters. Then, it doesn’t look like you considered all points of How do I write a correct micro-benchmark in Java?. And even if add was always almost two times slower than contains, for the question’s example string "ashish java", your code performs 15 contains checks and 10 add operations, compared to the 15 add operations of my solution. Do the math…
1

It's not entirely clear to me what's required by "using only the Set interface" but I'll assume that this means that the duplicate characters are to be returned in a Set. There are several ways of doing this. The first is the straightforward loop over the chars of the input string. It takes advantage of the feature of Set.add which returns true if the set was modified and false if it wasn't; that means that an add operation that returns false is a duplicate.

static Set<Character> dups0(String input) {
    Set<Character> dups = new HashSet<>();
    Set<Character> seen = new HashSet<>();
    for (char ch : input.toCharArray()) {
        if (! seen.add(ch)) {
            dups.add(ch);
        }
    }
    return dups;
}

There's a streamy way to do this, which is essentially the same thing expressed in stream form:

static Set<Character> dups1(String input) {
     Set<Character> seen = new HashSet<>();
     return input.chars()
                 .mapToObj(ch -> (char)ch)
                 .filter(ch -> !seen.add(ch))
                 .collect(toSet());
}

Some people might find this distasteful, as its filter operation performs side effects. In addition, if this is run in parallel, the result would need to be something like a ConcurrentHashMap.newKeySet.

An alternative is to generate a frequency table of characters and remove all the entries that occur just once:

static Set<Character> dups2(String input) {
     Map<Character, Long> map = input.chars()
                                     .mapToObj(i -> (char)i)
                                     .collect(groupingBy(ch -> ch, HashMap::new, counting()));
     map.values().removeIf(v -> v == 1);
     return map.keySet();
}

Note that this uses a collections bulk mutation operation on the values-collection view of the map. To ensure that the map is mutable, I've used the three-arg overload of groupingBy to specify the map's implementation type.

If you don't like mutation, there's a pure-streams way to do this:

static Set<Character> dups3(String input) {
    Map<Character, Long> map = input.chars()
                                    .mapToObj(i -> (char)i)
                                    .collect(groupingBy(ch -> ch, counting()));
    return map.entrySet().stream()
              .filter(entry -> entry.getValue() > 1)
              .map(Map.Entry::getKey)
              .collect(toSet());
}

Comments

0
public static void main(String[] args) {
        String s = "Javaashishj";
        char[] cArray = s.toCharArray();
        Set<Character> set = new HashSet<Character>();
        for (char c : cArray) {
            if (!set.contains(c)) {
                set.add(c);
                System.out.println(c);
            }
        }
    }

5 Comments

You need to check your code: Try String = "ferrari java"
Working fine. the out put is. No duplictaes. f e r a i j v
Please let me know if you have any issue with this code. I hope this should work fine.
Only 'r' and 'a' should be the output for String ferrari java". Only duplicates we need to print
I am sorry my program will remove duplicates.
0

You can make use of String.split() here. This method splits the string up into an array of strings based on the regular expression provided. We will use a "" because we want to split the string after every single character, and then input the results into a stream.

public static void duplicateString( String str ) {

    // collect all characters in a map, whose String value is the character and whose key value is the count of occurrences in the string
    Map<String,Long> charCountMap = Arrays.stream( str.split( "" ) )
            .filter( charInString -> !charInString.equals( " " ) ) // don't want spaces
            .collect( Collectors.groupingBy( Function.identity(), Collectors.counting() ) );

    charCountMap.entrySet()
            .stream()
            .filter( entrySet -> entrySet.getValue() > 1 ) // filter out occurrences that are one or less
            .forEach( entrySet -> System.out.println( String.format( "Char %s appeared %d times", entrySet.getKey(), entrySet.getValue() ) ) );
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.