Code below, gets the answer through get request and writes the result to the list "RESULT"
for i in url:
df = pd.read_html(i,header=0)[0]
df = df.as_matrix().tolist()
for item in df:
RESULT.append(item)
I use the code below to exclude duplicate entries:
def unique_items(RESULT):
found = set()
for item in RESULT:
if item[0] not in found:
yield item
found.add(item[0])
NOT_DUBLICATE = (list(unique_items(RESULT)))
print(NOT_DUBLICATE)
It seems to me it is not optimal since it is necessary to get a list of all the rows to exclude duplicates.
How can I find duplicates before loading a rows into the list RESULT?
for example, the rows I write to the list RESULT:
[[55323602, 'system]
,[55323603, 'system]]
[[55323602, 'system]
,[55323603, 'system]]
item[0]. We're gonna need a "eliminate duplicates based on a key function" sort of questionRESULTSand then removing duplicates from that list, just skip the duplicates in thefor item in df:loop.drop_duplicateson your df before mapping it into a list? It would automatically drop all duplicates.