133

I am trying to remove duplicates from a List of objects based on some property.

can we do it in a simple way using java 8

List<Employee> employee

Can we remove duplicates from it based on id property of employee. I have seen posts removing duplicate strings form arraylist of string.

4
  • 10
    Why you use List for that...use Set instead of List. Commented Apr 16, 2015 at 9:08
  • do you want to search for duplicates of employee.name? or what is your purpose, please give more information Commented Apr 16, 2015 at 9:16
  • 1
    @Ranjeet that only works if Employee properly implements equals and hashCode in such a way as to correctly identify duplicates. Commented Sep 27, 2017 at 21:50
  • great answer howtodoinjava.com/java8/java-stream-distinct-examples Commented Feb 23, 2020 at 11:45

9 Answers 9

200

You can get a stream from the List and put in in the TreeSet from which you provide a custom comparator that compares id uniquely.

Then if you really need a list you can put then back this collection into an ArrayList.

import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toCollection;

...
List<Employee> unique = employee.stream()
                                .collect(collectingAndThen(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))),
                                                           ArrayList::new));

Given the example:

List<Employee> employee = Arrays.asList(new Employee(1, "John"), new Employee(1, "Bob"), new Employee(2, "Alice"));

It will output:

[Employee{id=1, name='John'}, Employee{id=2, name='Alice'}]

Another idea could be to use a wrapper that wraps an employee and have the equals and hashcode method based with its id:

class WrapperEmployee {
    private Employee e;

    public WrapperEmployee(Employee e) {
        this.e = e;
    }

    public Employee unwrap() {
        return this.e;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        WrapperEmployee that = (WrapperEmployee) o;
        return Objects.equals(e.getId(), that.e.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(e.getId());
    }
}

Then you wrap each instance, call distinct(), unwrap them and collect the result in a list.

List<Employee> unique = employee.stream()
                                .map(WrapperEmployee::new)
                                .distinct()
                                .map(WrapperEmployee::unwrap)
                                .collect(Collectors.toList());

In fact, I think you can make this wrapper generic by providing a function that will do the comparison:

public class Wrapper<T, U> {
    private T t;
    private Function<T, U> equalityFunction;

    public Wrapper(T t, Function<T, U> equalityFunction) {
        this.t = t;
        this.equalityFunction = equalityFunction;
    }

    public T unwrap() {
        return this.t;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        @SuppressWarnings("unchecked")
        Wrapper<T, U> that = (Wrapper<T, U>) o;
        return Objects.equals(equalityFunction.apply(this.t), that.equalityFunction.apply(that.t));
    }

    @Override
    public int hashCode() {
        return Objects.hash(equalityFunction.apply(this.t));
    }
}

and the mapping will be:

.map(e -> new Wrapper<>(e, Employee::getId))
Sign up to request clarification or add additional context in comments.

14 Comments

Your first suggestion is a far better answer than the wrapper one :). Wrapper is obvious, but the first one is much better. I wasnt aware of collectingAndThen
@Patan Hard to tell without the test cases, check if you don't have any null reference in the list.
The first example works for me but I don't understand why :)
@Bevor It uses the set constructor that takes a comparator as parameter. In the example, every employe with the same id will be considered equals and will be unique in the resulting set.
@AvijitBarua you can compare as many fields as you want. The TreeSet constructor will accept any Comparator. In Java 8 and onward the comparingInt method is just a quick way to create a Comparator that compares int fields. If you want to add another field to the comparison you can use the thenComparing chained to the original compare so it would look something like comparingInt(Employee::getId).thenComparing(Employee::getName). This seems like a good article explaining Comparators - baeldung.com/java-8-comparator-comparing.
|
112

The easiest way to do it directly in the list is

HashSet<Object> seen = new HashSet<>();
employee.removeIf(e -> !seen.add(e.getID()));
  • removeIf will remove an element if it meets the specified criteria
  • Set.add will return false if it did not modify the Set, i.e. already contains the value
  • combining these two, it will remove all elements (employees) whose id has been encountered before

Of course, it only works if the list supports removal of elements.

4 Comments

You assume that id is unique, what if I have composite key
@user3871754: you need an object holding the composite key and having appropriate equals and hashCode implementations, e.g. yourList.removeIf(e -> !seen.add(Arrays.asList(e.getFirstKeyPart(), e.getSecondKeyPart()));. Composing the key via Arrays.asList works with an arbitrary number of components, whereas for small numbers of components a dedicated key type might be more efficient.
what do you mean all? I need to left at least one
This is by far the most elegant solution! Regarding composite keys: rameworks that implement the JPA (Java Persistence API) usually provide functions to directly define a class for a composite key in the model. If that is not possible, you can define a class or record to model the key as needed.
71

If you can make use of equals, then filter the list by using distinct within a stream (see answers above). If you can not or don't want to override the equals method, you can filter the stream in the following way for any property, e.g. for the property Name (the same for the property Id etc.):

Set<String> nameSet = new HashSet<>();
List<Employee> employeesDistinctByName = employees.stream()
            .filter(e -> nameSet.add(e.getName()))
            .collect(Collectors.toList());

5 Comments

This was pretty fine, it takes advantage of the simple functionality of filter that decide to filter or maintain every element based on a predicate (predicate to apply to each element to determine if it should be included), based on the property (String type) insertion in a set : true if newly inserted, false if it exists already...that was smart ! work great for me !
This example is good and simple.
Does it work fine in multi-threaded scenarios / parallel streams? I mean, is it thread safe kinda thing?
This is nice solution to remove duplicate items from the list. But my question was to get item form 2 list whose ids are not same.
wow! nice and simple.
24

Another solution is to use a Predicate, then you can use this in any filter:

public static <T> Predicate<T> distinctBy(Function<? super T, ?> f) {
  Set<Object> objects = new ConcurrentHashSet<>();
  return t -> objects.add(f.apply(t));
}

Then simply reuse the predicate anywhere:

employees.stream().filter(distinctBy(e -> e.getId));

Note: in the JavaDoc of filter, which says it takes a stateless Predicte. Actually, this works fine even if the stream is parallel.


About other solutions:

1) Using .collect(Collectors.toConcurrentMap(..)).values() is a good solution, but it's annoying if you want to sort and keep the order.

2) stream.removeIf(e->!seen.add(e.getID())); is also another very good solution. But we need to make sure the collection implemented removeIf, for example it will throw exception if we construct the collection use Arrays.asList(..).

5 Comments

Great solution when you can't override equals method and don't want to bloat your lambda with Set/List conversions like the accepted answer. Thanks!
I wonder why this is not added in the java 8 library. Using it something like stream().distinctBy(Employee::Id) would be of great convenience
f maybe null and throw nullpointer.
Nice! You could change new ConcurrentHashSet to ConcurrentHashMap.newKeySet() if you dont have a ConcurrentHashSet
Amazing solution, really sad it hasn't found its way into the JDK yet.
18

Try this code:

Collection<Employee> nonDuplicatedEmployees = employees.stream()
   .<Map<Integer, Employee>> collect(HashMap::new,(m,e)->m.put(e.getId(), e), Map::putAll)
   .values();

Comments

16

This worked for me:

list.stream().distinct().collect(Collectors.toList());

You need to implement equals, of course

3 Comments

You need to implement equals, of course
@Andronicus I added my comment into my response.
hashcode() also should be overridden, but according to the distinct() method of the Stream API docs.oracle.com/javase/8/docs/api/java/util/stream/… only equals should be
11

If order does not matter and when it's more performant to run in parallel, Collect to a Map and then get values:

employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values()

2 Comments

So, I guess something like this if you want a list back: employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values().stream().collect(Collectors.toList()). And, regarding parallel, you can you use it or not here - I mean parallelStream API?
@RokT.. no need to re-create a stream, just wrap it in an ArrayList. ex:- new ArrayList<>(.stream().collect()......values());
3

There are a lot of good answers here but I didn't find the one about using reduce method. So for your case, you can apply it in following way:

 List<Employee> employeeList = employees.stream()
      .reduce(new ArrayList<>(), (List<Employee> accumulator, Employee employee) ->
      {
        if (accumulator.stream().noneMatch(emp -> emp.getId().equals(employee.getId())))
        {
          accumulator.add(employee);
        }
        return accumulator;
      }, (acc1, acc2) ->
      {
        acc1.addAll(acc2);
        return acc1;
      });

1 Comment

working with parallel Streams there's a chance that the combiner will add together employees with the same id again.. in that case you need to check there aswell for duplicates.
0

Another version which is simple

BiFunction<TreeSet<Employee>,List<Employee> ,TreeSet<Employee>> appendTree = (y,x) -> (y.addAll(x))? y:y;

TreeSet<Employee> outputList = appendTree.apply(new TreeSet<Employee>(Comparator.comparing(p->p.getId())),personList);

1 Comment

This is an obfuscated version of TreeSet<Employee> outputList = new TreeSet<>(Comparator.comparing(p->p.getId())); outputList.addAll(personList); The straight-forward code is much simpler.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.