1

I have this array of data

data = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75]

I remove the duplicate data with list(set(data)), which gives me

data = [20001202.05, 20001202.50, 20001215.75, 20021215.75]

But I would like to remove the duplicate data, based on the numbers before the "period"; for instance, if there is 20001202.05 and 20001202.50, I want to keep one of them in my array.

3 Answers 3

11

As you don't care about the order of the items you keep, you could do:

>>> {int(d):d for d in data}.values()
[20001202.5, 20021215.75, 20001215.75]

If you would like to keep the lowest item, I can't think of a one-liner.

Here is a basic example for anybody who would like to add a condition on the key or value to keep.

seen = set()
result = []
for item in sorted(data):
    key = int(item)  # or whatever condition
    if key not in seen:
        result.append(item)
        seen.add(key)
Sign up to request clarification or add additional context in comments.

6 Comments

In your updated answer, there's no reason to use sorted(data).
A set is effectively just a dict with keys but no values - both use a hash table internally
@martineau < There is, if you want to keep only the lowest item, for example in data = [1.75, 1.05].
You can do the latter in two lines, but it's awkward and not very pythonic - seen = set(); result = [item for item in data if int(item) not in seen and seen.add(int(item)) is None].
@jonrsharpe < thank you for pointing out that set() is built on an optimized version of the hashing algorithm used for dict. Your two liner is nice too, even if I dislike repeating type conversion :^)
|
4

Generically, with python 3.7+, because dictionaries maintain order, you can do this, even when order matters:

data = {d:None for d in data}.keys()

However for OP's original problem, OP wants to de-dup based on the integer value, not the raw number, so see the top voted answer. But generically, this will work to remove true duplicates.

Comments

1
data1 = [20001202.05, 20001202.05, 20001202.50, 20001215.75, 20021215.75]
for i in data1:
   if i not in ls:
      ls.append(i)
print ls

2 Comments

You forgot to declare the variable ls and this still doesn't solve completely the problem, as it doesn't account the second condition of the OP.
Yes i apologize for this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.