merge sort python implementation methodology

Question

I'm learning the well known sorting algorithms and implementing them myself. I have recently done merge sort and the code I have is:

def merge(l, r, direction):
    # print('merging')
    # print(l, r)
    # adding infinity to end of list so we know when we've hit the bottom of one pile
    l.append(inf)
    r.append(inf)
    A = []
    i, j = 0, 0

    while (i < len(l)) and (j < len(r)):
        if l[i] <= r[j]:
            A.append(l[i])
            i += 1
        else:
            A.append(r[j])
            j += 1
    # removing infinity from end of list
    A.pop()

    return(A)


def merge_sort(num_lst, direction='increasing', level=0):
    if len(num_lst) > 1:
        mid = len(num_lst)//2
        l = num_lst[:mid]
        r = num_lst[mid:]
        l = merge_sort(l, level=level + 1)
        r = merge_sort(r, level=level + 1)
        num_lst = merge(l, r, direction)

    return num_lst

What I have seen in other implementations that differs from my own is in the merging of lists. Where I just create a blank list and append elements in numerical order other pass the preexisting into merge and overwrite each element to create a list in numerical order. Something like:

def merge(arr, l, m, r): 
    n1 = m - l + 1
    n2 = r- m 
  
    # create temp arrays 
    L = [0] * (n1) 
    R = [0] * (n2) 
  
    # Copy data to temp arrays L[] and R[] 
    for i in range(0 , n1): 
        L[i] = arr[l + i] 
  
    for j in range(0 , n2): 
        R[j] = arr[m + 1 + j] 
  
    # Merge the temp arrays back into arr[l..r] 
    i = 0     # Initial index of first subarray 
    j = 0     # Initial index of second subarray 
    k = l     # Initial index of merged subarray 
  
    while i < n1 and j < n2 : 
        if L[i] <= R[j]: 
            arr[k] = L[i] 
            i += 1
        else: 
            arr[k] = R[j] 
            j += 1
        k += 1
  
    # Copy the remaining elements of L[], if there 
    # are any 
    while i < n1: 
        arr[k] = L[i] 
        i += 1
        k += 1
  
    # Copy the remaining elements of R[], if there 
    # are any 
    while j < n2: 
        arr[k] = R[j] 
        j += 1
        k += 1

I'm curious about the following:

Problem

Is using a append() on a blank list a bad idea? By my understanding when python creates a list it grabs a certain size chunk of memory and if our list grows beyond that copies the list to a different and larger section of memory (which seems it would be pretty high cost if it happened even once for a large list let alone repeatedly). Is there just a higher cost to using append() compared to accessing a list by index? It would seemed to me append would be able to do things at a fairly low cost.

a_guest · Accepted Answer · 2019-07-19 15:16:45Z

3

When you instantiate a list Python will allocate the memory necessary for holding the items plus some extra memory for future appends / extends. When you put too much extra stuff in the list it eventually needs to be reallocated which potentially slows down the program (see this part of the source code). The reallocation happens here and the size is calculated as:

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

As the comment states the growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ....

So if the size of the final list is known in advance and allocated from the beginning this will save you some extra reallocations (and thus compute time).

answered Jul 19, 2019 at 15:16

a_guest

36.7k15 gold badges75 silver badges137 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Fabian · Accepted Answer · 2019-07-19 14:06:54Z

0

from time import time 
append_arr = []
index_arr = [None] * 10240*1024

t0 = time()
for x in range(10240*1024):
    append_arr.append(x)
t1 = time()

t2 = time()
for i in range(10240*1024):
    index_arr[i] = i
t3 = time()

print(str(t1-t0))
print(str(t3-t2))

Looks like appending is slower.

answered Jul 19, 2019 at 14:06

Fabian

1,15011 silver badges26 bronze badges

2 Comments

Dan Over a year ago

You should move append_arr = [] index_arr = [None] * 10240*1024 inside your timing just in case the time to initialise the large array has an impact.

RhythmInk Over a year ago

I've already done this testing. To clarify, my question is whether or not I understand the way lists are managed in python and why append() would end up being slower?

Collectives™ on Stack Overflow

merge sort python implementation methodology

Problem

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Problem

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related