I recently signed up for a scientific python course and it was shown in classroom how a numpy array can perform better than a list in some situations. For a statistical simulation, I tried the two approaches and surprinsingly, the numpy array takes a lot longer to finish the process. Could someone help me find my(possible) mistake?
My first thought was that probably something is wrong with the way the code is written, but I can't figure out how it can be wrong. The script calculates how many attempts on average someone needs to complete a collection of sorted stickers:
Python list
I used a function and no external modules.
import random as rd
import statistics as st
def collectStickers(experiments, collectible):
obtained = []
attempts = 0
while(len(obtained) < collectible):
new_sticker = rd.randint(1, collectible)
if new_sticker not in obtained:
obtained.append(new_sticker)
attempts += 1
experiments.append(attempts)
experiments = []
collectible = 20
rep_experiment = 100000
for i in range(1, rep_experiment):
collectStickers(experiments, collectible)
print(st.mean(experiments))
Results
The processing time seems ok for a simple experiment like this one, but for more complex purposes, 13.8 seconds is too much.
72.06983069830699
[Finished in 13.8s]
Numpy
I could not use any function as the following errors showed up when I followed the same logic as above:
RuntimeWarning: Mean of empty slice.
RuntimeWarning: invalid value encountered in double_scalars
So I just went for the naive way:
import random as rd
import numpy as np
experiments = np.array([])
rep_experiment = 100000
for i in range(1, rep_experiment):
obtained = np.array([])
attempts = 0
while(len(obtained) < 20):
new_sticker = rd.randint(1, 20)
if new_sticker not in obtained:
obtained = np.append(obtained, new_sticker)
attempts += 1
experiments = np.append(experiments, attempts)
print(np.mean(experiments))
Results
Almost 4x slower!
Is the difference in the use of a function?
72.03112031120311
[Finished in 54.2s]
numpyare usually are slower.numpyis best when using itswhole-arraymethods. Iterating through an array one by one is slow, as is building an array iteratively (np.appendis not a list append clone).np.randomis fast - if you ask for 1000s of random numbers at once. Effective use ofnumpyrequires looking at the problem in a different way, from the top down.