How to count duplicates in an array of strings?

Question

How do I partition a String to extract all the words/terms that occur in it and count how many times each occurs? For example let: String q = "foo bar foo" I want a DS {<foo,2>, <bar,1>}. This is the least verbose code I code come with*. Faults or less verbose alternatives?

String[] split = q.toString().split("\\s");
        Map<String, Integer> terms = new HashMap<String, Integer>();

        for (String term : split) {
            if(terms.containsKey(term)){
                terms.put(term, terms.get(term)+1);
            }
        }

(haven't compiled it)

You're close. Just add an else (if the term is not in the map already) — Andreas Dolk
– Andreas Dolk, Commented Aug 29, 2011 at 8:46

Buhake Sindi · Accepted Answer · 2011-08-29 08:47:57Z

5

Modified code:

String[] split = q.toString().split("\\s");
Map<String, Integer> terms = new HashMap<String, Integer>();

for (String term : split) {
    int score = 0;
    if(terms.containsKey(term)){
        score = terms.get(term);
    }

    terms.put(term, score +1);
}

PS: Untested.

answered Aug 29, 2011 at 8:47

Buhake Sindi

89.5k30 gold badges176 silver badges234 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ashkan Aryan · Accepted Answer · 2011-08-29 12:05:13Z

I would go with the code suggested by Elite Gentleman, but I'm just putting this as a discussion point: What about using StringTokenizer? If scalability/performance was an issue, would tokenizer perform better? You may have to loop throught the string only once in that case, as opposed to doing the regex split first and then another traverse through the array.

Something like this:

StringTokenizer st = new StringTokenizer(s);
HashMap<String, Integer> terms = new HashMap<String, Integer>();

while (st.hasMoreElements()) {

    String term = st.nextToken();
    int score = 0;
    if(terms.containsKey(term)){
        score = terms.get(term);
    }

    terms.put(term, score +1);
}

I know that StringTokenizer, thought not deprecated, is a Legacy class according to java docs and it's use is not recommended:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

However I wonder if in this case for a simple token it gives more performant results.

Any thoughts?

anand krish · Accepted Answer · 2023-08-23 09:03:26Z

0

Using Java 8 :

    String name = "anandha";
     name.chars()   //returns IntStream 
    .mapToObj(ch -> (char)ch) //returns Stream<Character>
    .collect(Collectors.groupingBy(ch -> ch, Collectors.counting())) //returns  Map<Character, Long>
    .forEach((k, v)->{
        System.out.println(k+ " : " + v);
    });

Output:

 a : 3
 d : 1
 h : 1
 n : 2

edited Aug 23, 2023 at 9:03

answered Aug 23, 2023 at 8:56

anand krish

4,4355 gold badges47 silver badges49 bronze badges

Collectives™ on Stack Overflow

How to count duplicates in an array of strings?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related