Remove duplicates from a list of objects based on property in Java 8 [duplicate]

Question

I am trying to remove duplicates from a List of objects based on some property.

can we do it in a simple way using java 8

List<Employee> employee

Can we remove duplicates from it based on id property of employee. I have seen posts removing duplicate strings form arraylist of string.

do you want to search for duplicates of employee.name? or what is your purpose, please give more information — Dude
– Dude, Commented Apr 16, 2015 at 9:16
@Ranjeet that only works if Employee properly implements equals and hashCode in such a way as to correctly identify duplicates. — Madbreaks
– Madbreaks, Commented Sep 27, 2017 at 21:50
great answer howtodoinjava.com/java8/java-stream-distinct-examples — Dusman
– Dusman, Commented Feb 23, 2020 at 11:45

shareef · Accepted Answer · 2020-11-23 16:57:39Z

200

You can get a stream from the List and put in in the TreeSet from which you provide a custom comparator that compares id uniquely.

Then if you really need a list you can put then back this collection into an ArrayList.

import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toCollection;

...
List<Employee> unique = employee.stream()
                                .collect(collectingAndThen(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))),
                                                           ArrayList::new));

Given the example:

List<Employee> employee = Arrays.asList(new Employee(1, "John"), new Employee(1, "Bob"), new Employee(2, "Alice"));

It will output:

[Employee{id=1, name='John'}, Employee{id=2, name='Alice'}]

Another idea could be to use a wrapper that wraps an employee and have the equals and hashcode method based with its id:

class WrapperEmployee {
    private Employee e;

    public WrapperEmployee(Employee e) {
        this.e = e;
    }

    public Employee unwrap() {
        return this.e;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        WrapperEmployee that = (WrapperEmployee) o;
        return Objects.equals(e.getId(), that.e.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(e.getId());
    }
}

Then you wrap each instance, call distinct(), unwrap them and collect the result in a list.

List<Employee> unique = employee.stream()
                                .map(WrapperEmployee::new)
                                .distinct()
                                .map(WrapperEmployee::unwrap)
                                .collect(Collectors.toList());

In fact, I think you can make this wrapper generic by providing a function that will do the comparison:

public class Wrapper<T, U> {
    private T t;
    private Function<T, U> equalityFunction;

    public Wrapper(T t, Function<T, U> equalityFunction) {
        this.t = t;
        this.equalityFunction = equalityFunction;
    }

    public T unwrap() {
        return this.t;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        @SuppressWarnings("unchecked")
        Wrapper<T, U> that = (Wrapper<T, U>) o;
        return Objects.equals(equalityFunction.apply(this.t), that.equalityFunction.apply(that.t));
    }

    @Override
    public int hashCode() {
        return Objects.hash(equalityFunction.apply(this.t));
    }
}

and the mapping will be:

.map(e -> new Wrapper<>(e, Employee::getId))

edited Nov 23, 2020 at 16:57

shareef

9,64116 gold badges64 silver badges94 bronze badges

answered Apr 16, 2015 at 10:07

Alexis C.

94.3k22 gold badges173 silver badges179 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Jatin Over a year ago

Your first suggestion is a far better answer than the wrapper one :). Wrapper is obvious, but the first one is much better. I wasnt aware of collectingAndThen

Alexis C. Over a year ago

@Patan Hard to tell without the test cases, check if you don't have any null reference in the list.

Bevor Over a year ago

The first example works for me but I don't understand why :)

Alexis C. Over a year ago

@Bevor It uses the set constructor that takes a comparator as parameter. In the example, every employe with the same id will be considered equals and will be unique in the resulting set.

cbender Over a year ago

@AvijitBarua you can compare as many fields as you want. The TreeSet constructor will accept any Comparator. In Java 8 and onward the comparingInt method is just a quick way to create a Comparator that compares int fields. If you want to add another field to the comparison you can use the thenComparing chained to the original compare so it would look something like comparingInt(Employee::getId).thenComparing(Employee::getName). This seems like a good article explaining Comparators - baeldung.com/java-8-comparator-comparing.

|

RubioRic · Accepted Answer · 2023-03-08 12:32:25Z

112

The easiest way to do it directly in the list is

HashSet<Object> seen = new HashSet<>();
employee.removeIf(e -> !seen.add(e.getID()));

removeIf will remove an element if it meets the specified criteria
Set.add will return false if it did not modify the Set, i.e. already contains the value
combining these two, it will remove all elements (employees) whose id has been encountered before

Of course, it only works if the list supports removal of elements.

edited Mar 8, 2023 at 12:32

RubioRic

2,4684 gold badges32 silver badges38 bronze badges

answered Apr 16, 2015 at 10:50

Holger

301k43 gold badges481 silver badges827 bronze badges

4 Comments

Kamil Nękanowicz Over a year ago

You assume that id is unique, what if I have composite key

Holger Over a year ago

@user3871754: you need an object holding the composite key and having appropriate equals and hashCode implementations, e.g. yourList.removeIf(e -> !seen.add(Arrays.asList(e.getFirstKeyPart(), e.getSecondKeyPart()));. Composing the key via Arrays.asList works with an arbitrary number of components, whereas for small numbers of components a dedicated key type might be more efficient.

user25 Over a year ago

what do you mean all? I need to left at least one

Martin del Necesario yesterday

This is by far the most elegant solution! Regarding composite keys: rameworks that implement the JPA (Java Persistence API) usually provide functions to directly define a class for a composite key in the model. If that is not possible, you can define a class or record to model the key as needed.

Rolch2015 · Accepted Answer · 2018-04-10 03:18:52Z

71

If you can make use of equals, then filter the list by using distinct within a stream (see answers above). If you can not or don't want to override the equals method, you can filter the stream in the following way for any property, e.g. for the property Name (the same for the property Id etc.):

Set<String> nameSet = new HashSet<>();
List<Employee> employeesDistinctByName = employees.stream()
            .filter(e -> nameSet.add(e.getName()))
            .collect(Collectors.toList());

answered Apr 10, 2018 at 3:18

Rolch2015

1,5911 gold badge18 silver badges24 bronze badges

5 Comments

Sentary Over a year ago

This was pretty fine, it takes advantage of the simple functionality of filter that decide to filter or maintain every element based on a predicate (predicate to apply to each element to determine if it should be included), based on the property (String type) insertion in a set : true if newly inserted, false if it exists already...that was smart ! work great for me !

Nathani Software Over a year ago

This example is good and simple.

Arun Gowda Over a year ago

Does it work fine in multi-threaded scenarios / parallel streams? I mean, is it thread safe kinda thing?

Masi Boo Over a year ago

This is nice solution to remove duplicate items from the list. But my question was to get item form 2 list whose ids are not same.

logbasex Over a year ago

wow! nice and simple.

navins · Accepted Answer · 2018-07-14 18:21:05Z

24

Another solution is to use a Predicate, then you can use this in any filter:

public static <T> Predicate<T> distinctBy(Function<? super T, ?> f) {
  Set<Object> objects = new ConcurrentHashSet<>();
  return t -> objects.add(f.apply(t));
}

Then simply reuse the predicate anywhere:

employees.stream().filter(distinctBy(e -> e.getId));

Note: in the JavaDoc of filter, which says it takes a stateless Predicte. Actually, this works fine even if the stream is parallel.

About other solutions:

1) Using .collect(Collectors.toConcurrentMap(..)).values() is a good solution, but it's annoying if you want to sort and keep the order.

2) stream.removeIf(e->!seen.add(e.getID())); is also another very good solution. But we need to make sure the collection implemented removeIf, for example it will throw exception if we construct the collection use Arrays.asList(..).

answered Jul 14, 2018 at 18:21

navins

3,4672 gold badges30 silver badges29 bronze badges

5 Comments

Ramy Arbid Over a year ago

Great solution when you can't override equals method and don't want to bloat your lambda with Set/List conversions like the accepted answer. Thanks!

Arun Gowda Over a year ago

I wonder why this is not added in the java 8 library. Using it something like stream().distinctBy(Employee::Id) would be of great convenience

Zon Over a year ago

f maybe null and throw nullpointer.

KeKru Over a year ago

Nice! You could change new ConcurrentHashSet to ConcurrentHashMap.newKeySet() if you dont have a ConcurrentHashSet

Marian Klühspies Over a year ago

Amazing solution, really sad it hasn't found its way into the JDK yet.

Tho · Accepted Answer · 2015-04-16 10:11:43Z

18

Try this code:

Collection<Employee> nonDuplicatedEmployees = employees.stream()
   .<Map<Integer, Employee>> collect(HashMap::new,(m,e)->m.put(e.getId(), e), Map::putAll)
   .values();

edited Apr 16, 2015 at 10:11

answered Apr 16, 2015 at 9:51

Tho

25.5k6 gold badges64 silver badges49 bronze badges

Comments

Sebastian D'Agostino · Accepted Answer · 2019-03-21 13:24:44Z

16

This worked for me:

list.stream().distinct().collect(Collectors.toList());

You need to implement equals, of course

edited Mar 21, 2019 at 13:24

answered Mar 7, 2018 at 13:00

Sebastian D'Agostino

1,6752 gold badges29 silver badges45 bronze badges

3 Comments

Sebastian D'Agostino Over a year ago

You need to implement equals, of course

Sebastian D'Agostino Over a year ago

@Andronicus I added my comment into my response.

jhenya-d Over a year ago

hashcode() also should be overridden, but according to the distinct() method of the Stream API docs.oracle.com/javase/8/docs/api/java/util/stream/… only equals should be

Xiao Liu · Accepted Answer · 2017-05-24 03:54:43Z

11

If order does not matter and when it's more performant to run in parallel, Collect to a Map and then get values:

employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values()

edited May 24, 2017 at 3:54

answered May 24, 2017 at 3:30

Xiao Liu

4604 silver badges12 bronze badges

2 Comments

Rok T. Over a year ago

So, I guess something like this if you want a list back:

employee.stream().collect(Collectors.toConcurrentMap(Employee::getId, Function.identity(), (p, q) -> p)).values().stream().collect(Collectors.toList())

. And, regarding parallel, you can you use it or not here - I mean parallelStream API?

Tharindu Eranga Over a year ago

@RokT.. no need to re-create a stream, just wrap it in an ArrayList. ex:- new ArrayList<>(.stream().collect()......values());

Alex · Accepted Answer · 2017-06-27 10:12:01Z

3

There are a lot of good answers here but I didn't find the one about using reduce method. So for your case, you can apply it in following way:

 List<Employee> employeeList = employees.stream()
      .reduce(new ArrayList<>(), (List<Employee> accumulator, Employee employee) ->
      {
        if (accumulator.stream().noneMatch(emp -> emp.getId().equals(employee.getId())))
        {
          accumulator.add(employee);
        }
        return accumulator;
      }, (acc1, acc2) ->
      {
        acc1.addAll(acc2);
        return acc1;
      });

answered Jun 27, 2017 at 10:12

Alex

2,0402 gold badges21 silver badges38 bronze badges

1 Comment

Sven Dhaens Over a year ago

working with parallel Streams there's a chance that the combiner will add together employees with the same id again.. in that case you need to check there aswell for duplicates.

zawhtut · Accepted Answer · 2015-04-17 02:04:45Z

0

Another version which is simple

BiFunction<TreeSet<Employee>,List<Employee> ,TreeSet<Employee>> appendTree = (y,x) -> (y.addAll(x))? y:y;

TreeSet<Employee> outputList = appendTree.apply(new TreeSet<Employee>(Comparator.comparing(p->p.getId())),personList);

edited Apr 17, 2015 at 2:04

answered Apr 16, 2015 at 18:58

zawhtut

8,5915 gold badges55 silver badges78 bronze badges

1 Comment

Holger Over a year ago

This is an obfuscated version of TreeSet<Employee> outputList = new TreeSet<>(Comparator.comparing(p->p.getId())); outputList.addAll(personList); The straight-forward code is much simpler.

Collectives™ on Stack Overflow

Remove duplicates from a list of objects based on property in Java 8 [duplicate]

9 Answers 9

14 Comments

4 Comments

5 Comments

5 Comments

Comments

3 Comments

2 Comments

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

14 Comments

4 Comments

5 Comments

5 Comments

Comments

3 Comments

2 Comments

1 Comment

1 Comment

Linked

Related