17

I have an ArrayList with the following strings;

 List<String> e = new ArrayList<String>();
 e.add("123");
 e.add("122");
 e.add("125");
 e.add("123");

I want to check the list for duplicates and remove them from the list. In this case my list will only have two values, and in this example it would be the values 122 and 125, and the two 123s will go away.

What will be the best way to this? I was thinking of using a Set, but that will only remove one of the duplicates.

5
  • You can use a Map<String, Integer> (that represent the number of times the String is in the list), then filter the entries that only have a value of 1, and collect the corresponding keys into a new list. Commented Oct 14, 2015 at 13:20
  • 1
    @3Kings he to want remove if value has duplicate then remove value along with duplicate....so in above example both 123 Commented Oct 14, 2015 at 13:20
  • set's add() method returns true if the value has no dupe and was inserted successfully. you can use that to get an indication if the new value you're inserting is a dupe. then you can find and remove the dupe Commented Oct 14, 2015 at 13:21
  • you can use a multihashmap Commented Oct 14, 2015 at 13:27
  • A Set would not remove items, it would prevent adding duplicate items. Commented Oct 14, 2015 at 15:23

11 Answers 11

25

In Java 8 you can do:

e.removeIf(s -> Collections.frequency(e, s) > 1);

If !Java 8 you can create a HashMap<String, Integer>. If the String already appears in the map, increment its key by one, otherwise, add it to the map.

For example:

put("123", 1);

Now let's assume that you have "123" again, you should get the count of the key and add one to it:

put("123", get("aaa") + 1);

Now you can easily iterate on the map and create a new array list with keys that their values are < 2.

References:

Sign up to request clarification or add additional context in comments.

13 Comments

The Java 8 version does work, but only because it's an ArrayList, and removeIf is overridden to do all the removals in bulk at the end. It doesn't work on a LinkedList, for example.
List::removeIf is a clean solution but its complexity is O(n²) because of iterating over the List and Collection::frequency am I right?
The complexity of finding duplicates is O(n²), applying removeIf only adds a constant time. So the overall complexity is indeed O(n²).
If you create a Map<String, Long> to count the occurrences and then iterate over the EntrySet to get the unique elements you've got an O(2*n) -> O(n) complexity or am I wrong?
@Taemyr Correct, but m is in O(n) in any implementation of hashmap I know of (I'd go so far as to say it's the only reasonable choice) since the capacity is resized to keep it larger by some fraction than the current size.
|
12

You can also use filter in Java 8

e.stream().filter(s -> Collections.frequency(e, s) == 1).collect(Collectors.toList())

Comments

6

You could use a HashMap<String, Integer>.

You iterate over the list and if the Hash map does not contain the string, you add it together with a value of 1.

If, on the other hand you already have the string, you simply increment the counter. Thus, the map for your string would look like this:

{"123", 2}
{"122", 1}
{"125", 1}

You would then create a new list where the value for each key is 1.

Comments

4

Here is a non-Java 8 solution using a map to count occurrences:

Map <String,Integer> map = new HashMap<String, Integer>();
for (String s : list){
    if (map.get(s) == null){
      map.put(s, 1);
    } 
    else {
      map.put(s, map.get(s) + 1);
    }
}

List<String> newList = new ArrayList<String>();

// Remove from list if there are multiples of them.
for (Map.Entry<String, String> entry : map.entrySet())
{
  if(entry.getValue() > 1){
    newList.add(entry.getKey());
  }
}

list.removeAll(newList);

2 Comments

newList adds all the entries with 2 or more. This is a temporary list. list is the original list, so in order to "return list", I'm modifying this one by removing all the entries with count = 1.
The question is to remove the ones with count >= 2.
2

Solution in ArrayList

public static void main(String args[]) throws Exception {
      List<String> e = new ArrayList<String>();
      List<String> duplicate = new ArrayList<String>();
      e.add("123");
      e.add("122");
      e.add("125");
      e.add("123");

      for(String str : e){
          if(e.indexOf(str) != e.lastIndexOf(str)){
              duplicate.add(str);
          }
      }

      for(String str : duplicate){
          e.remove(str);              
      }

      for(String str : e){
          System.out.println(str);
      }
  }

Comments

2

The simplest solutions using streams have O(n^2) time complexity. If you try them on a List with millions of entries, you'll be waiting a very, very long time. An O(n) solution is:

list = list.stream()
           .collect(Collectors.groupingBy(Function.identity(), LinkedHashMap::new, Collectors.counting()))
           .entrySet()
           .stream()
           .filter(e -> e.getValue() == 1)
           .map(Map.Entry::getKey)
           .collect(Collectors.toList());

Here, I used a LinkedHashMap to maintain the order. Note that static imports can simplify the collect part.

This is so complicated that I think using for loops is the best option for this problem.

Map<String, Integer> map = new LinkedHashMap<>();
for (String s : list)
    map.merge(s, 1, Integer::sum);
list = new ArrayList<>();
for (Map.Entry<String, Integer> e : map.entrySet())
    if (e.getValue() == 1)
        list.add(e.getKey());

7 Comments

Stream complexity is also O(2*n) therefor O(n)
@Flown It says O(n)
You're saying O(n^2)
@Flown I'm not. It says the simplest solutions using streams are O(n^2). My solution is not the simplest.
you could also use .collect(groupingBy(identity(), counting()))
|
2
List<String> e = new ArrayList<String>();
e.add("123");
e.add("122");
e.add("125");
e.add("123");
e.add("125");
e.add("124");
List<String> sortedList = new ArrayList<String>();
for (String current : e){
    if(!sortedList.contains(current)){
        sortedList.add(current);
    }
    else{
        sortedList.remove(current);
    }
}
e.clear();
e.addAll(sortedList);

Comments

1

I'm a fan of the Google Guava API. Using the Collections2 utility and a generic Predicate implementation it's possible to create a utility method to cover multiple data types.

This assumes that the Objects in question have a meaningful .equals implementation

@Test
    public void testTrimDupList() {
        Collection<String> dups = Lists.newArrayList("123", "122", "125", "123");
        dups = removeAll("123", dups);
        Assert.assertFalse(dups.contains("123"));

        Collection<Integer> dups2 = Lists.newArrayList(123, 122, 125,123);
        dups2 = removeAll(123, dups2);
        Assert.assertFalse(dups2.contains(123));
    }

    private <T> Collection<T> removeAll(final T element, Collection<T> collection) {
        return Collections2.filter(collection, new Predicate<T>(){
            @Override
            public boolean apply(T arg0) {
                return !element.equals(arg0);
            }});
    }

Thinking about this a bit more

Most of the other examples in this page are using the java.util.List API as the base Collection. I'm not sure if that is done with intent, but if the returned element has to be a List, another intermediary method can be used as specified below. Polymorphism ftw!

@Test
    public void testTrimDupListAsCollection() {
        Collection<String> dups = Lists.newArrayList("123", "122", "125", "123");
        //List used here only to get access to the .contains method for validating behavior.
        dups = Lists.newArrayList(removeAll("123", dups)); 
        Assert.assertFalse(dups.contains("123"));

        Collection<Integer> dups2 = Lists.newArrayList(123, 122, 125,123);
      //List used here only to get access to the .contains method for validating behavior.
        dups2 = Lists.newArrayList(removeAll(123, dups2));
        Assert.assertFalse(dups2.contains(123));
    }

    @Test
    public void testTrimDupListAsList() {
        List<String> dups = Lists.newArrayList("123", "122", "125", "123");
        dups = removeAll("123", dups);
        Assert.assertFalse(dups.contains("123"));

        List<Integer> dups2 = Lists.newArrayList(123, 122, 125,123);
        dups2 = removeAll(123, dups2);
        Assert.assertFalse(dups2.contains(123));
    }

    private <T> List<T> removeAll(final T element, List<T> collection) {
        return Lists.newArrayList(removeAll(element, (Collection<T>) collection));

    }
    private <T> Collection<T> removeAll(final T element, Collection<T> collection) {
        return Collections2.filter(collection, new Predicate<T>(){
            @Override
            public boolean apply(T arg0) {
                return !element.equals(arg0);
            }});
    }

Comments

1

Something like this (using a Set):

Set<Object> blackList = new Set<>()

public void add(Object object) {
    if (blackList.exists(object)) {
        return;
    }
    boolean notExists = set.add(object);
    if (!notExists) {
       set.remove(object)
       blackList.add(object);
    }
}

4 Comments

And what do you do if you have 3 times 123 in the list?
you will get 1 instance of 123, it's bad or good depending what you're trying to accomplish and do. technically when you insert values one by one, you handle them one by one. so after the 2'nd insert of 123, you remove 123. you don't have 123 in your collection so it's ok to reinsert it. if you want to be fancy use a blacklist
"you will get 1 instance of 123, it's bad or good depending what you're trying to accomplish and do. " But this is not what the OP wants to do. If he has more than 1 time a string in his list, he doesn't want it to be in the final list.
you're right, my bad! i've changed the code. it's not tested but the idea should be obvious
1

If you are going for set then you can achieve it with two sets. Maintain duplicate values in the other set as follows:

List<String> duplicateList = new ArrayList<String>();

duplicateList.add("123");
duplicateList.add("122");
duplicateList.add("125");
duplicateList.add("123");
duplicateList.add("127");
duplicateList.add("127");

System.out.println(duplicateList);

Set<String> nonDuplicateList = new TreeSet<String>();
Set<String> duplicateValues = new TreeSet<String>();

if(nonDuplicateList.size()<duplicateList.size()){
    for(String s: duplicateList){
        if(!nonDuplicateList.add(s)){
            duplicateValues.add(s);
        }
    }

    duplicateList.removeAll(duplicateValues);

    System.out.println(duplicateList);
    System.out.println(duplicateValues);
}

Output: Original list: [123, 122, 125, 123, 127, 127]. After removing
duplicate: [122, 125] values which are duplicates: [123, 127]


Note: This solution might not be optimized. You might find a better
solution than this.

Comments

0

With the Guava library, using a multiset and streams:

e = HashMultiset.create(e).entrySet().stream()
    .filter(me -> me.getCount() > 1)
    .map(me -> me.getElement())
    .collect(toList());

This is pretty, and reasonably fast for large lists (O(n) with a rather large constant factor). But it does not preserve order (LinkedHashMultiset can be used if that is desired) and it creates a new list instance.

It is also easy to generalise, to instead remove all triplicates for example.

In general the multiset data structure is really useful to keep in ones toolbox.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.