1

So I'm struggling to make this code more useful for larger data-sets. Here is the code, I will explain it thoroughly afterwards:

import numpy as np
np.set_printoptions(threshold='nan')

tri_nums = [3, 2, 1]


paths = [1, 3, 4, 5]

vol_list = [10, 10, 10, 15, 15, 25]

n = 0

array_list = []

while n <= len(tri_nums):
    for num in tri_nums:
        print "assigning volume", vol_list[sum(tri_nums)-len(vol_list)+1]
        volume_array = np.zeros(shape = (6, 6))
        volume_array[paths[num-1],paths[num]] = vol_list[sum(tri_nums)-len(vol_list)+1]
        array_list.append(volume_array)
        print paths[num-1], paths[num]


    tri_nums.pop(0)
    paths.pop(0)
    n+=1
    print paths
    print tri_nums


final_array = sum(array_list)
print array_list
print final_array

Starting with tri_nums: The values of tri_nums will always be a list of the triangular numbers of the length of paths. So a paths list of say, [1, 3, 4, 5, 6, 8], will give a tri_nums of [5, 4, 3, 2, 1].

tri_nums are also correlated to the number of values in vol_list. As you can see, there are three 10's in vol_list. The number of 10's is equal to the first value of tri_nums. There are also two 15's and a 2 for the second value of tri_nums. This pattern will never change! Another example of this is:

paths = [1, 3, 4, 5, 6, 8]

tri_nums = [5, 4, 3, 2, 1]

vol_list = [15, 15, 15, 15, 15, 250, 250, 250, 250, 25, 25, 25, 10, 10, 15]

The list paths (in the original case) is made up of four 'nodes', nodes 1,3,4 and 5. Between each adjacent node there is a path, i.e. path 1-3, 3-4, 4-5.

As one can see, volume_array is a 6x6 array and is made up of zeros. The row values in volume_array that are to be changed correspond to the first value of each path i.e. 1,3, 4. The column values correspond to the second number of each path i.e. 3, 4,5.

Here comes the tricky bit!

The values in vol_list are allocated to the aforementioned array items as follows:

  1. For each value of tri_nums a value in vol_list is added to volume_array. The row value within this array is defined by the first value of a path i.e. [4]and the column value is defined by the second value of a path (for the value [4] this will mean [5]).
  2. For tri_nums[0], the value 10 is added three times, once to volume_array[4][5], once to volume_array[3][4] and once to volume_array[1][3].
  3. For tri_nums[1] the value 15 is added twice, once to volume_array[4][5] and once to volume_array[3][4].
  4. For tri_nums[2] the value 25 is added once to volume_array[4][5].
  5. Finally, all of the values in of the arrays generated in the previous three steps are added together to get final_array.

Another thing worth mentioning is that the sum of tri_nums is equal to len(vol_list). Furthermore tri_nums[n] is always > tri_nums[n+1].

Ideally I would like to implement this code for path's, tri_num's and vol_list's with hundreds of items in them. The method I am using now would mean I need to make hundreds of while loops by hand. How can I make the while loops work simultaneously so I can avoid the "hundreds of while loops" scenario?

Everything is working just fine, but the final output is:

[[  0.   0.   0.   0.   0.   0.]
 [  0.   0.   0.  10.   0.   0.]
 [  0.   0.   0.   0.   0.   0.]
 [  0.   0.   0.   0.  25.   0.]
 [  0.   0.   0.   0.   0.  25.]
 [  0.   0.   0.   0.   0.   0.]]

Meaning that the final value of vol_list which is (25) has not been assigned to array_list[4][5] and thus was not in final_array. It just needs to do one more loop and it'll work, I'm not sure how to get it to do the last loop though.

Please let me know if anything is unclear!

Thanks

12
  • 1
    list is a datatype - it shouldn't be used as a variable name (rename it). while count < 1: is redundant since you'd get the same result if you removed it. It only makes sense to leave it in if you'll later want to iterate more than once. Commented Apr 5, 2016 at 21:20
  • anything you can do by hand you can have a computer do. Think about the steps that you would take to do it by hand. Then make the computer do that. Sometimes it's harder to make the computer to that, but YMMV. Commented Apr 5, 2016 at 21:53
  • Since you asked, the code sample is a bit on the long side, and the description is also moderately complex. I think it would make a clearer question if you interspersed the parts of the description with the corresponding pieces of the code. Adding sample output in a couple places would help further. That being said, I kind of get what you're doing; let me see if I can come up with an answer. Commented Apr 6, 2016 at 7:23
  • Also, considering these criteria it's possible this is better placed on Code Review (but if that's the case, it can be migrated, no need to manually repost it there). Commented Apr 6, 2016 at 7:24
  • Also also, now that I'm looking at the code, some things don't make sense. E.g. each of your while loops is subtly different. For example, the first one references two entries of paths at each iteration, namely paths[n] and paths[n+1], but the other loops reference one entry of paths each time through. Each loop computes its indices into volume_array differently. The third loop increments b but doesn't actually use its value anywhere in the loop. Are these intentional inconsistencies? Why? What do the three indices n, a, and b mean? Commented Apr 6, 2016 at 7:43

1 Answer 1

1

The reason you miss the last element of your array is that you're incrementing n at the same time you're popping elements off of tri_nums. Look at the values of n and tri_nums at the beginning of each iteration of your while loop:

iteration    n    tri_nums    n <= len(tri_nums)
0            0    [3, 2, 1]   True
1            1    [2, 1]      True
2            2    [1]         False

You should either keep n at 0, and make your condition while tri_nums (which is equivalent to while len(tri_nums) > 0), or probably better, you should avoid modifying tri_nums and just use a for loop. You would then need to modify the inner loop to only iterate over part of tri_nums each time, like so:

for n in xrange(len(tri_nums)):
    for num in tri_nums[n:]:

That being said, the whole approach of iterating over lists, creating multiple arrays, and adding them all up is quite inefficient. Because this isn't Code Review, I won't get into all the inefficiencies, but there are a few key ones I want to mention:

  • You have a lot of structure in your input data that you could take advantage of
  • You should try to use Numpy vectorized operations instead of native Python operations as much as possible
  • You keep putting numbers at the same indices, so you can add up the numbers first and only create the array at the end

With all that in mind, I'd recommend changing your code so that vol_list only contains each number once.

vol_list = [10, 15, 25]

You can then construct the array you need by adding your numbers up first, and then sticking the resulting sums in the array. Numpy conveniently includes the cumsum function to compute partial sums of an array:

>>> np.cumsum([10, 15, 25])
array([10, 25, 50])

and it allows you to specify many values at once in its indexing operations. So your entire algorithm can be reduced to this:

final_array = np.zeros((6, 6))
final_array[paths[:-1], paths[1:]] = np.cumsum(vol_list)

If your memory requirements become problematic for long lists, you might want to use Scipy's sparse matrices for storage, instead of ordinary Numpy arrays.

Sign up to request clarification or add additional context in comments.

2 Comments

This is incredible. I'm just doublechecking that it works for bigger datasets, but so far so good. You sir, are a wizard and a scholar.
Glad to help, but I do hope you'll keep in mind how difficult it was to understand the question in the first place, and use that to improve your future questions. I think you could stand to be a lot more careful when describing how your code works. It would also be helpful to clean up your question one more time for future readers. (And if my answer solved your problem, it'd be great if you accept it by clicking the green checkmark.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.