2

PROBLEM

I have a list of arrays and I want to count the occurrences of duplicates.

For example, if I have this :

{{1,2,3},
 {1,0,3},
 {1,2,3},
 {5,2,6},
 {5,2,6},
 {5,2,6}}

I want a map (or any relevant collection) like this :

{ {1,2,3} -> 2,
  {1,0,3} -> 1,
  {5,2,6} -> 3 }

I can even lose the arrays values, I'm only interested in cardinals (e.g. 2, 1 and 3 here).

MY SOLUTION

I use the following algorithm :

  • First hash the arrays, and check if each hash is in an HashMap<Integer, ArrayList<int[]>>, let's name it distinctHash, where the key is the hash and the value is an ArrayList, let's name it rowList, containing the different arrays for this hash (to avoid collisions).

  • If the hash is not in distinctHash, put it with the value 1 in another HashMap<int[], Long> that counts each occurrence, let's call it distinctElements.

  • Then if the hash is in distinctHash, check if the corresponding array is contained in rowList. If it is, increment the value in distinctElements associated to the identical array found in rowList. (If you use the new array as a key you will create another key since their reference are different).

Here is the code, the boolean returned tells if a new distinct array was found, I apply this function sequentially on all of my arrays :

    HashMap<int[], Long> distinctElements;
    HashMap<Integer, ArrayList<int[]>> distinctHash;

    private boolean addRow(int[] row) {

        if (distinctHash.containsKey(hash)) {
            int[] indexRow = distinctHash.get(hash).get(0);
            for (int[] previousRow: distinctHash.get(hash)) {
                if (Arrays.equals(previousRow, row)) {
                    distinctElements.put(
                            indexRow,
                            distinctElements.get(indexRow) + 1
                    );
                    return false;
                }
            }
            distinctElements.put(row, 1L);

            ArrayList<int[]> rowList = distinctHash.get(hash);
            rowList.add(row);
            distinctHash.put(hash, rowList);

            return true;

        } else {
            distinctElements.put(row, 1L);

            ArrayList<int[]> newValue = new ArrayList<>();
            newValue.add(row);
            distinctHash.put(hash, newValue);

            return true;
        }
    }

QUESTION

The problem is that my algorithm is too slow for my needs (40s for 5,000,000 arrays, and 2h-3h for 20,000,000 arrays). Profiling with NetBeans told me that the hashing takes 70% of runtime (using Google Guava murmur3_128 hash function).

Is there another algorithm that could be faster? As I said I'm not interested in arrays values, only in the number of their occurrences. I am ready to sacrifice precision for speed so a probabilistic algorithm is fine.

4
  • 1
    What do you know about the structure of the arrays? Are they always 3-digit long as in the example? Do they always contain digits / numbers or can it be anything? Commented Oct 3, 2018 at 16:35
  • The arrays are of a fixed size (around 10 most of the time) determined in another part of my code. They contain only ints. Commented Oct 3, 2018 at 16:45
  • 1
    Did you try the most simple approach of creating a map, where the key is the integer array and the value is a single integer? You would need to use List<int> as the key, and Arrays.asList for insertion. Commented Oct 3, 2018 at 16:45
  • @Markus I tried this simple approach with int[] before remembering that the equals() method compares references (hence the Arrays.equals method). I will definitely try that with Lists. Commented Oct 3, 2018 at 18:57

3 Answers 3

4

Wrap the int[] in a class that implements equals and hashCode, then build Map of the wrapper class to instance count.

class IntArray {
    private int[] array;
    public IntArray(int[] array) {
        this.array = array;
    }
    @Override
    public int hashCode() {
        return Arrays.hashCode(this.array);
    }
    @Override
    public boolean equals(Object obj) {
        return (obj instanceof IntArray && Arrays.equals(this.array, ((IntArray) obj).array));
    }
    @Override
    public String toString() {
        return Arrays.toString(this.array);
    }
}

Test

int[][] input = {{1,2,3},
                 {1,0,3},
                 {1,2,3},
                 {5,2,6},
                 {5,2,6},
                 {5,2,6}};
Map<IntArray, Long> map = Arrays.stream(input).map(IntArray::new)
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
map.entrySet().forEach(System.out::println);

Output

[1, 2, 3]=2
[1, 0, 3]=1
[5, 2, 6]=3

Note: The above solution is faster and uses less memory than solution by Ravindra Ranwala, but it does require the creation of an extra class, so it is debatable which is better.

For smaller arrays, use the simpler solution below by Ravindra Ranwala.
For larger arrays, the above solution is likely better.

 Map<List<Integer>, Long> map = Stream.of(input)
         .map(a -> Arrays.stream(a).boxed().collect(Collectors.toList()))
         .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
Sign up to request clarification or add additional context in comments.

1 Comment

This solution is twice faster than mine. Thanks!
3

You may do it like so,

Map<List<Integer>, Long> result = Stream.of(source)
        .map(a -> Arrays.stream(a).boxed().collect(Collectors.toList()))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

And here's the output,

{[1, 2, 3]=2, [1, 0, 3]=1, [5, 2, 6]=3}

1 Comment

After some tests, this solution is slower than mine. It takes 50s when mine takes 40s for the same sample.
0

If the sequence of elements for all duplication of that array is like each other and the length of each array is not much, you can map each array to an int number and using from last part of your method. Although this method decrease the time of hashing, there are some assumptions here which might not be true for your case.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.