2

I have a program that handles big sets of data. Most of it runs really quickly except a few lines of the code that seem to sometimes take up to around a minute.

What these lines of code do is they take a list of 1 to 1000 different values ranging from 0 to 1000000. Then for a defined k number of times they filter the list and removes the value that is the most represented in that list (if multiple values are represented the same amount of times it just takes a random one).

Is there a way to write this so it takes less time to execute.

I run Windows 10 Pro with an Intel I7-8086K GTX1660 and 16Gb DDR4 RAM.

k = int(new1[1])


    def most_frequent(lists):
        return max(set(lists), key=lists.count)


    ite = 0
    while ite < k:
        new = list(filter(lambda a: a != most_frequent(new), new))
        ite += 1

As said before I am getting the results I want it's just that the program is extremely slow.

1
  • 2
    You are calling most_frequent for each filtering operation, you should do this: most_frequent_element = most_frequent(new) Then use that value in filter list(filter(lambda a: a != most_frequent_element , new)) You should see a big difference Commented Oct 6, 2019 at 11:38

1 Answer 1

4

How about this:

# count elements in the list

counts = {}
for x in ls:
    counts[x] = (counts.get(x) or 0) + 1

# sort counts by most frequent

counts = sorted(counts.items(), key=lambda c: -c[1])

# get top k elements

to_remove = set(c[0] for c in counts[:k])

# remove them

new_list = [x for x in ls if x not in to_remove]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much it now runs in under half a second!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.