Remove duplicates from list in python

Question

Code below, gets the answer through get request and writes the result to the list "RESULT"

for i in url:
    df = pd.read_html(i,header=0)[0]
    df = df.as_matrix().tolist()
    for item in df:           
        RESULT.append(item)

I use the code below to exclude duplicate entries:

def unique_items(RESULT):
found = set()
for item in RESULT:
    if item[0] not in found:
        yield item
        found.add(item[0])
NOT_DUBLICATE = (list(unique_items(RESULT)))
print(NOT_DUBLICATE)

It seems to me it is not optimal since it is necessary to get a list of all the rows to exclude duplicates.

How can I find duplicates before loading a rows into the list RESULT?

for example, the rows I write to the list RESULT:

[[55323602, 'system]
,[55323603, 'system]]
[[55323602, 'system]
,[55323603, 'system]]

@msanford I don't think that's a suitable dupe - the OP isn't really eliminating duplicates; they're comparing the elements by item[0]. We're gonna need a "eliminate duplicates based on a key function" sort of question — Aran-Fey
– Aran-Fey, Commented May 7, 2018 at 14:04
He is asking something which can avoid duplicates before appending to the list. check my answer! — Ishara Dayarathna
– Ishara Dayarathna, Commented May 7, 2018 at 14:12
@Aran-Fey Fair observation; I'll retract. Phillip you may wish to rephrase your title. — msanford
– msanford, Commented May 7, 2018 at 14:12
I don't understand the problem. You say it's "necessary to get a list of all the rows to exclude duplicates", but that's not even true. Instead of building a list RESULTS and then removing duplicates from that list, just skip the duplicates in the for item in df: loop. — Aran-Fey
– Aran-Fey, Commented May 7, 2018 at 14:19
@phillipwatts344, is it not possible to call drop_duplicates on your df before mapping it into a list? It would automatically drop all duplicates. — Erol
– Erol, Commented May 7, 2018 at 14:46

Ishara Dayarathna · Accepted Answer · 2018-05-07 14:04:08Z

1

Instead of use another method to exclude duplicate entries, append item to the list if item doesn't exist in the list RESULT. Then you don't need method unique_items().

You can find duplicates before loading a row into the list RESULT using this:

for i in url:
    df = pd.read_html(i,header=0)[0]
    df = df.as_matrix().tolist()
    for item in df:  
        if item not in RESULT         
            RESULT.append(item)

edited May 7, 2018 at 14:04

answered May 7, 2018 at 13:55

Ishara Dayarathna

3,6195 gold badges28 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Erol · Accepted Answer · 2018-05-07 14:49:25Z

1

Just use a set instead of a list.

result = set()
for i in url:
    df = pd.read_html(i,header=0)[0]
    df_list = df.as_matrix().tolist()
    for item in df_list:          
       result.add(tuple(item))

Above code will exclude any duplicates. The only difference from your case will be that elements of result will be tuples instead of lists.

At the end, you can recast the set to a list by:

result = list(result)

edited May 7, 2018 at 14:49

answered May 7, 2018 at 13:57

Erol

6,5266 gold badges43 silver badges56 bronze badges

6 Comments

Aran-Fey Over a year ago

1) Your result is a dict, not a set. 2) item isn't defined in your loop. 3) item seems to be a list, and lists can't be stored in sets.

Erol Over a year ago

@Aran-Fey Thanks for first two points, I corrected them. Regarding #3, you are wrong. A set can be updated with iterables: docs.python.org/3/library/stdtypes.html#frozenset.update.

Aran-Fey Over a year ago

Yes, a set can be updated with an iterable, but that's not what we're trying to do here. We're trying to detect duplicate rows based on the first element, i.e. item[0]. Your code doesn't do that; it just tosses all the values in a row into a set. You end up with a list of values, not a list of rows.

Erol Over a year ago

If that is the case, the last edit should work fine.

Erol Over a year ago

Given OP's example, the second element always seems to be 'system' so my code technically compares based on the first element. @phillipwatts344, correct me if I am wrong.

|

Collectives™ on Stack Overflow

Remove duplicates from list in python

2 Answers 2

Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related