I have a database table called 'do_not_call', which contains information about files that hold a range of 10 digit phone numbers in the increasing order. The column 'filename' holds the name of file that contain the range of numbers from 'first_phone' to 'last_phone'. There are about 2500 records in 'do_not_call' table.
And I have a list of sqlalchemy records. I need to find which file is holding the 'phone' field of these records. So I have created a function which takes in the sqlalchemy records and returns a dictionary where the key is the name of file and value is a list of phone numbers from the sqlalchemy records that falls in the range of first and last phone numbers, contained in the file.
def get_file_mappings(dbcursor, import_records):
start_time = datetime.now()
phone_list = [int(rec.phone) for rec in import_records]
dnc_sql = "SELECT * from do_not_call;"
dbcursor.execute(dnc_sql)
dnc_result = dbcursor.fetchall()
file_mappings = {}
for file_info in dnc_result:
first_phone = int(file_info.get('first_phone'))
last_phone = int(file_info.get('last_phone'))
phone_ranges = list(filter(lambda phone: phone in range(first_phone, last_phone), phone_list))
if phone_ranges:
file_mappings.update({file_info.get('filename'): phone_ranges})
phone_list = list(set(phone_list) - set(phone_ranges))
# print(file_mappings)
print("Time = ", datetime.now() - start_time)
return file_mappings
For example if the phone_list is
[2023143300, 2024393100, 2027981539, 2022760321, 2026416368, 2027585911], the file_mappings returned will be
{'1500000_2020-9-24_Global_45A62481-17A2-4E45-82D6-DDF8B58B1BF8.txt': [2023143300, 2022760321],
'1700000_2020-9-24_Global_45A62481-17A2-4E45-82D6-DDF8B58B1BF8.txt': [2024393100],
'1900000_2020-9-24_Global_45A62481-17A2-4E45-82D6-DDF8B58B1BF8.txt': [2027981539, 2026416368, 2027585911]}
The problem here is that it takes a lot of time to execute. On average it takes about 1.5 seconds for 1000 records. Is there a better approach/algorithm to solve this problem. Any help is appreciated.
list(filter(lambda phone: phone in range(first_phone, last_phone), phone_list))is more idiomatic (and likely faster) as[phone for phone in phone_list if phone in range(first_phone, last_phone)].filter), then forcing it into a list has always been slower for me. Also,lambdause seems to be slow.list(set(phone_list) - set(phone_ranges))is also probably going to be much faster asphone_range_set = set(phone_ranges); phone_list = [phone for phone in phone_list if phone not in phone_range_set]Creating two sets just to do subtraction seems needlessly expensive.phone_list? And it seems your result is somewhat backward from what I would expect... I would think you would want key=phone, value=file ?