3

I am profiling some genetic algorithm code with some nested loops and from what I see most of the time is spent in two of my functions which involve slicing and adding up numpy arrays. I tried my best to further optimize them but would like to see if others come up with ideas.

Function 1:

The first function is called 2954684 times for a total time spent inside the function of 19 seconds

We basically just create views inside numpy arrays contained in data[0], according to coordinates contained in data[1]

def get_signal(data, options):
    #data[0] contains bed, data[1] contains position
    #forward = 0, reverse = 1
    start = data[1][0] - options.halfwinwidth
    end = data[1][0] + options.halfwinwidth
    if data[1][1] == 0:
        normals_forward = data[0]['normals_forward'][start:end]
        normals_reverse = data[0]['normals_reverse'][start:end]
    else:
        normals_forward = data[0]['normals_reverse'][end - 1:start - 1: -1]
        normals_reverse = data[0]['normals_forward'][end - 1:start - 1: -1]

    row = {'normals_forward': normals_forward,
           'normals_reverse': normals_reverse,
           }
    return row

Function 2:

Called 857 times for a total time of 13.674 seconds spent inside the function:

signal is a list of numpy arrays of equal length with dtype float, options is just random options

The goal of the function is just to add up the lists of each numpy arrays to a single one, calculate the intersection of the two curves formed by the forward and reverse arrays and return the result

def calculate_signal(signal, options):

    profile_normals_forward = np.zeros(options.halfwinwidth * 2, dtype='f')
    profile_normals_reverse = np.zeros(options.halfwinwidth * 2, dtype='f')

    #here i tried np.sum over axis = 0, its significantly slower than the for loop approach
    for b in signal:
        profile_normals_forward += b['normals_forward']
        profile_normals_reverse += b['normals_reverse']

    count = len(signal)

    if options.normalize == 1:
        #print "Normalizing to max counts"
        profile_normals_forward /= max(profile_normals_forward)
        profile_normals_reverse /= max(profile_normals_reverse)
      elif options.normalize == 2:
        #print "Normalizing to number of elements"
        profile_normals_forward /= count
        profile_normals_reverse /= count

    intersection_signal = np.fmin(profile_normals_forward, profile_normals_reverse)
    intersection = np.sum(intersection_signal)

    results = {"intersection": intersection,
               "profile_normals_forward": profile_normals_forward,
               "profile_normals_reverse": profile_normals_reverse,
               }
    return results

As you can see the two are very simple, but account for > 60% of my execution time on a script that can run for hours / days (genetic algorithm optimization), so even minor improvements are welcome :)

5
  • 2
    Have you tried running kernprof to see where these functions are spending their time? Commented Nov 18, 2013 at 18:41
  • Good idea, I'll do that, but then again they are both pretty simple Commented Nov 18, 2013 at 19:36
  • If they aren't too big could you have all the signal arrays into 1 numpy array and work with them on a 2-3 dimension basis? as well as combining profile_normals_forward and profile_normals_reverse. Commented Nov 18, 2013 at 22:37
  • 1
    I am trying to follow the overall flow. You have these two functions which for the number of times they are called seem pretty fast. Are these two functions called inside of a bigger loop that tries to optimize the genetic function? I guess my question is they account for >60% of runtime but it looks like they only take 1-2 minutes total, but the script takes hours to run. Commented Nov 19, 2013 at 15:38
  • What is data? A Python list or numpy array? A list of arrays? What's the dtype of those arrays? I'm guessing recarray. Commented Dec 17, 2013 at 2:04

1 Answer 1

2

One simple thing I would do to increase the speed of the first function is to use different notation for the accessing of the list indices as detailed here.

For example:

foo = numpyArray[1][0] 
bar = numpyArray[1,0]

The second line will execute much faster because you don't have to return the entire element at numpyArray[1] and then find the first element of that. Try it out

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.