How to Count Unique Values in an ArrayList?

Question

I have to count the number of unique words from a text document using Java. First I had to get rid of the punctuation in all of the words. I used the Scanner class to scan each word in the document and put in an String ArrayList.

So, the next step is where I'm having the problem! How do I create a method that can count the number of unique Strings in the array?

For example, if the array contains apple, bob, apple, jim, bob; the number of unique values in this array is 3.

public countWords() {
    try {
        Scanner scan = new Scanner(in);
        while (scan.hasNext()) {
            String words = scan.next();
            if (words.contains(".")) {
                words.replace(".", "");
            }
            if (words.contains("!")) {
                words.replace("!", "");
            }
            if (words.contains(":")) {
                words.replace(":", "");
            }
            if (words.contains(",")) {
                words.replace(",", "");
            }
            if (words.contains("'")) {
                words.replace("?", "");
            }
            if (words.contains("-")) {
                words.replace("-", "");
            }
            if (words.contains("‘")) {
                words.replace("‘", "");
            }
            wordStore.add(words.toLowerCase());
        }
    } catch (FileNotFoundException e) {
        System.out.println("File Not Found");
    }
    System.out.println("The total number of words is: " + wordStore.size());
}

Are there any restrictions to what you can or can't use?

gtgaxiola
– gtgaxiola

2012-10-04 03:50:21 +00:00
Commented Oct 4, 2012 at 3:50 — gtgaxiola
– gtgaxiola, Commented Oct 4, 2012 at 3:50
no their are no restrictions!

user1405298
– user1405298

2012-10-04 03:51:44 +00:00
Commented Oct 4, 2012 at 3:51 — user1405298
– user1405298, Commented Oct 4, 2012 at 3:51

kosa · Accepted Answer · 2012-10-04 03:55:21Z

25

Are you allowed to use Set? If so, you HashSet may solve your problem. HashSet doesn't accept duplicates.

HashSet noDupSet = new HashSet();
noDupSet.add(yourString);
noDupSet.size();

size() method returns number of unique words.

If you have to really use ArrayList only, then one way to achieve may be,

1) Create a temp ArrayList
2) Iterate original list and retrieve element
3) If tempArrayList doesn't contain element, add element to tempArrayList

edited Oct 4, 2012 at 3:55

answered Oct 4, 2012 at 3:50

kosa

66.7k15 gold badges134 silver badges170 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1405298 Over a year ago

Yes, I'm allowed to use HashSet. Can you please show me how to use HashSet?

user1405298 Over a year ago

I don't have to ArrayList only, I can use anything that works. Can i instatiate a new HashSet and add all the string values from the ArrayList?

kosa Over a year ago

Yes, you can (or) you can directly add elements to Set, that way you don't even need ArrayList.

ROMANIA_engineer · Accepted Answer · 2016-02-05 12:12:50Z

19

Starting from Java 8 you can use Stream:

After you add the elements in your ArrayList:

long n = wordStore.stream().distinct().count();

It converts your ArrayList to a stream and then it counts only the distinct elements.

answered Feb 5, 2016 at 12:12

ROMANIA_engineer

57k30 gold badges211 silver badges207 bronze badges

Comments

Yogendra Singh · Accepted Answer · 2012-10-04 03:58:53Z

3

I would advice to use HashSet. This automatically filters the duplicate when calling add method.

answered Oct 4, 2012 at 3:58

Yogendra Singh

34.5k7 gold badges66 silver badges73 bronze badges

Comments

Eric B. · Accepted Answer · 2012-10-04 04:06:23Z

2

Although I believe a set is the easiest solution, you can still use your original solution and just add an if statement to check if value already exists in the list before you do your add.

if( !wordstore.contains( words.toLowerCase() )
   wordStore.add(words.toLowerCase());

Then the number of words in your list is the total number of unique words (ie: wordStore.size() )

answered Oct 4, 2012 at 4:06

Eric B.

24.7k57 gold badges188 silver badges332 bronze badges

2 Comments

user1405298 Over a year ago

Thanks for you help! - Isn't HashSet more efficient because it doesn't allow previous values by default.

Eric B. Over a year ago

Absolutely it should be. However, I wanted to give you an option that wouldn't cause you to change your existing code. Really, you were just missing an "if" statement.

Casmon Gordon · Accepted Answer · 2016-09-26 16:03:23Z

1

This general purpose solution takes advantage of the fact that the Set abstract data type does not allow duplicates. The Set.add() method is specifically useful in that it returns a boolean flag indicating the success of the 'add' operation. A HashMap is used to track the occurrence of each original element. This algorithm can be adapted for variations of this type of problem. This solution produces O(n) performance..

public static void main(String args[])
{
  String[] strArray = {"abc", "def", "mno", "xyz", "pqr", "xyz", "def"};
  System.out.printf("RAW: %s ; PROCESSED: %s \n",Arrays.toString(strArray), duplicates(strArray).toString());
}

public static HashMap<String, Integer> duplicates(String arr[])
{

    HashSet<String> distinctKeySet = new HashSet<String>();
    HashMap<String, Integer> keyCountMap = new HashMap<String, Integer>();

    for(int i = 0; i < arr.length; i++)
    {
        if(distinctKeySet.add(arr[i]))
            keyCountMap.put(arr[i], 1); // unique value or first occurrence
        else
            keyCountMap.put(arr[i], (Integer)(keyCountMap.get(arr[i])) + 1);
    }     

    return keyCountMap; 
}

RESULTS:

RAW: [abc, def, mno, xyz, pqr, xyz, def] ; PROCESSED: {pqr=1, abc=1, def=2, xyz=2, mno=1}

answered Sep 26, 2016 at 16:03

Casmon Gordon

214 bronze badges

3 Comments

Laurel Over a year ago

Are you actually quoting something? If you're not, don't use quote formatting. If you are quoting something, you need to properly attribute it.

walen Over a year ago

This 4 years old question already has an answer using HashSet for O(1) performance. Your algorithm for counting occurrences of words in a String array, does not answer OP's question (you're not counting unique values in an ArrayList); nor does it improve the current solution. Maybe you misunderstood the question?

Casmon Gordon Over a year ago

Thanks for the feedback. I apologize for the confusion. I simply wanted to share a solution for counting distinct elements in an array that I thought was interesting/different, and could perhaps be useful to someone else in the future who may be researching solutions to a similar problem. I probably should have added the solution to a more appropriate thread.

FSP · Accepted Answer · 2012-10-04 03:57:24Z

0

You can create a HashTable or HashMap as well. Keys would be your input strings and Value would be the number of times that string occurs in your input array. O(N) time and space.

Solution 2:

Sort the input list. Similar strings would be next to each other. Compare list(i) to list(i+1) and count the number of duplicates.

edited Oct 4, 2012 at 3:57

answered Oct 4, 2012 at 3:51

FSP

4,8572 gold badges21 silver badges19 bronze badges

Comments

namalfernandolk · Accepted Answer · 2012-10-04 04:06:11Z

0

In shorthand way you can do it as follows...

    ArrayList<String> duplicateList = new ArrayList<String>();
    duplicateList.add("one");
    duplicateList.add("two");
    duplicateList.add("one");
    duplicateList.add("three");

    System.out.println(duplicateList); // prints [one, two, one, three]

    HashSet<String> uniqueSet = new HashSet<String>();

    uniqueSet.addAll(duplicateList);
    System.out.println(uniqueSet); // prints [two, one, three]

    duplicateList.clear();
    System.out.println(duplicateList);// prints []


    duplicateList.addAll(uniqueSet);
    System.out.println(duplicateList);// prints [two, one, three]

answered Oct 4, 2012 at 4:06

namalfernandolk

9,15415 gold badges70 silver badges119 bronze badges

2 Comments

user1405298 Over a year ago

Personally, I don't understand why I would use your shorthand method. I could just create loop to add the String values inside the HashSet; the HashSet doesn't allow previous values by default.

namalfernandolk Over a year ago

Here I have mentioned away to extract the unique vaues of an array list. Thought the shorthand method is handier to use. But it is your preference to select the best methos... :)

ROMANIA_engineer · Accepted Answer · 2016-02-05 12:13:44Z

0

public class UniqueinArrayList {

    public static void main(String[] args) { 
        StringBuffer sb=new StringBuffer();
        List al=new ArrayList();
        al.add("Stack");
        al.add("Stack");
        al.add("over");
        al.add("over");
        al.add("flow");
        al.add("flow");
        System.out.println(al);
        Set s=new LinkedHashSet(al);
        System.out.println(s);
        Iterator itr=s.iterator();
        while(itr.hasNext()){
            sb.append(itr.next()+" ");
        }
        System.out.println(sb.toString().trim());
    }

}

edited Feb 5, 2016 at 12:13

ROMANIA_engineer

57k30 gold badges211 silver badges207 bronze badges

answered Feb 8, 2013 at 22:16

S N Prasad Rao

10311 bronze badges

Comments

ChandraBhan Singh · Accepted Answer · 2017-06-06 13:06:32Z

0

3 distinct possible solutions:

Use HashSet as suggested above.

Create a temporary ArrayList and store only unique element like below:

public static int getUniqueElement(List<String> data) {
    List<String> newList = new ArrayList<>();
    for (String eachWord : data)
    if (!newList.contains(eachWord))
        newList.add(eachWord);
    return newList.size();
}

Java 8 solution

long count = data.stream().distinct().count();

edited Jun 6, 2017 at 13:06

user7605325

answered May 28, 2017 at 12:04

ChandraBhan Singh

3,0341 gold badge26 silver badges30 bronze badges

1 Comment

Matt Sgarlata Over a year ago

I strongly advise against method 2. It is very inefficient compared to methods 1 and 3, particularly as the size of the list becomes larger. Method 2 is O(n^2) versus methods 1 and 3 which are just O(n). This is because the call to newList.contains is O(n) and that call is itself within a loop which is also O(n), thus making the overall complexity O(n^2).

Collectives™ on Stack Overflow

How to Count Unique Values in an ArrayList?

9 Answers 9

3 Comments

Comments

Comments

2 Comments

3 Comments

Comments

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

3 Comments

Comments

Comments

2 Comments

3 Comments

Comments

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related