I want to generate IDs for strings that are being read from a text file. If the strings are duplicates, I want the first instance of the string to have an ID containing 6 characters. For the duplicates of that string, I want the ID to be the same as the original one, but with an additional two characters. I'm having trouble with the logic. Here's what I've done so far:
from itertools import groupby
import uuid
f = open('test.txt', 'r')
addresses = f.readlines()
list_of_addresses = ['Address']
list_of_ids = ['ID']
for x in addresses:
list_of_addresses.append(x)
def find_duplicates():
for x, y in groupby(sorted(list_of_addresses)):
id = str(uuid.uuid4().get_hex().upper()[0:6])
j = len(list(y))
if j > 1:
print str(j) + " instances of " + x
list_of_ids.append(id)
print list_of_ids
find_duplicates()
How should I approach this?
Edit: here's the contents of test.txt:
123 Test
123 Test
123 Test
321 Test
567 Test
567 Test
And the output:
3 occurences of 123 Test
['ID', 'C10DD8']
['ID', 'C10DD8']
2 occurences of 567 Test
['ID', 'C10DD8', '595C5E']
['ID', 'C10DD8', '595C5E']
321and your ids are the same for your duplicates. You mentioned adding two more characters.