I am running a query that needs to exclude all previously seen items and return a certain number of items (10 in this case). The list of items is sent in a querystring parameter and are uuid4 strings separated by '&'.
With a possible very large database, I don't think it makes sense to add the exclude statement with the query as most of the results wont be in the alreadySeenItems list and the database is pretty hefty.
Which of the following methods would be faster, assuming that the alreadySeenList can be pretty large >1000 items in some case.
Note: I would normally just use the first one, but since I need an exact match and I know where each word starts, it might make sense to do otherwise
def getItems(request, alreadySeenItems):
newItems = []
allPossibleItems = Items.objects.raw('SELECT "id" FROM items_table WHERE conditions;')
# method 1
for item in allPossibleItems:
if item.id not in alreadySeenItems:
newItems.append(item.id)
if len(newItems) > 10:
return newItems
# method 2
alreadySeenItemsList = alreadySeenItems.split('&')
for item.id in allPossibleItems:
if not checkForItemInList(item.id, alreadySeenItems)
newItems.append(item.id)
if len(newItems) > 10:
return newItems
# method 3
alreadySeenItemsSet = set(alreadySeenItems.split('&'))
for item.id in allPossibleItems:
if not item.id in alreadySeenItemsSet
newItems.append(item.id)
if len(newItems) > 10:
return newItems
def checkForItemInList(item, items):
for tmp in items:
if item == tmp:
return True
return False
timeiton some representative data and find out?