I have an Oracle database that I cannot add new tables inside, hence on Django I've created a sqlite database basically just to sync items from the Oracle database to sqlite.
Currently there's about 0.5 million items in the Oracle database.
All of the primary keys in the Oracle database is incremental, however, there's no guarantee. Sometimes there's network hiccups that breaks the connection between Django and the Oracle database, and I would miss some values when when synchronizing.
Hence, I came up with a model inside Django:
class sequential_missing(models.Model):
database = models.CharField(max_length=200, primary_key=True)
row = models.IntegerField(primary_key=True)
Basically there is a row in a database, which is missing from the Oracle side, and I will compare between the missing sequential in the sqlite database, and figure out that missing sequential number is actually empty inside the Oracle database. Hence speedup the process of not actually checking ALL the missing sequential values.
The whole function is as follows:
def checkMissing(maxValue, databaseObjects, databaseName):
missingValues = []
#############SECTION 1##########################
print "Database:" + databaseName
print "Checking for Missing Sequential Numbers"
set_of_pk_values = set(databaseObjects.objects.all().values_list('pk', flat=True))
set_one_to_max_value = set(xrange(1, maxValue+1))
missingValues = set_one_to_max_value.difference(set_of_pk_values)
#############SECTION 1##########################
#Even though missingValues could be enough, but the problem is that not even Oracle can
#guarantee the automatic incremented number is sequential, hence we would look up the values
#we thought it was missing, and remove them from missingValues, which should be faster than
#checking all of them in the oracle database
#############SECTION 2##########################
print "Checking for numbers that are empty, Current Size:" + str(len(missingValues))
emptyRow = []
for idx, val in enumerate(missingValues):
found = False
for items in sequential_missing.objects.all():
if(items.row == val and items.database == databaseName):
found = True
#print "Database:" + str(items.row) + ", Same as Empty Row:" + str(val)
if(found == True):
emptyRow.append(val)
#############SECTION 2##########################
#############SECTION 3##########################
print "Removing empty numbers, Current Size:" + str(len(missingValues)) + ", Empty Row:" + str(len(emptyRow))
missingValuesCompared = []
for idx, val in enumerate(missingValues):
found = False
for items in emptyRow:
if(val == items):
found = True
#print "Empty Row:" + str(items) + ", same as Missing Values:" + str(val)
if(found == False):
missingValuesCompared.append(val)
print "New Size:" + str(len(missingValuesCompared))
return missingValuesCompared
#############SECTION 3##########################
The code is split into 3 sections:
Figures out what sequential value is missing
Checks for the values between the model, if there's any that is matching, and is the same
Create a new array that does not include the row that is included in section 2.
The problem is that section 2 takes a long time O(n^2), because it has to iterate through the whole database and to check whether if the row is originally empty.
Is there a faster way to do this, whilst consuming minimal memory?
Edit:
Using ROW IN is much better,
setItem = []
for items in missingValues:
setItem.append(items)
print "Items in setItem:" + str(len(setItem))
currentCounter = 0
currentEndCounter = 500
counterIncrement = 500
emptyRowAppend = []
end = False
firstPass = False
while(end == False):
emptyRow = sequential_missing.objects.filter(database=databaseName, row__in = setItem[currentCounter:currentEndCounter])
for items in emptyRow:
emptyRowAppend.append(items.row)
if(firstPass == True):
end = True
if ((currentEndCounter+counterIncrement)>maxValue):
currentCounter += counterIncrement
currentEndCounter = maxValue
firstPass = True
else:
currentCounter += counterIncrement
currentEndCounter += counterIncrement
print "Removing empty numbers," + "Empty Row Append Size:" + str(len(emptyRowAppend)) + ", Missing Value Size:" + str(len(missingValues)) + ", Set Item Size:" + str(len(setItem)) + ", Empty Row:" + str(len(emptyRowAppend))
missingValuesCompared = []
for idx, val in enumerate(missingValues):
found = False
for items in emptyRowAppend:
if(val == items):
found = True
break
if(found == False):
missingValuesCompared.append(val)