efficiently compute ordering permutations in numpy array

Question

I've got a numpy array. What is the fastest way to compute all the permutations of orderings.

What I mean is, given the first element in my array, I want a list of all the elements that sequentially follow it. Then given the second element, a list of all the elements that follow it.

So given my list: b, c, & d follow a. c & d follow b, and d follows c.

x = np.array(["a", "b", "c", "d"])

So a potential output looks like:

[
    ["a","b"],
    ["a","c"],
    ["a","d"],

    ["b","c"],
    ["b","d"],

    ["c","d"],
]

I will need to do this several million times so I am looking for an efficient solution.

I tried something like:

im = np.vstack([x]*len(x))
a = np.vstack(([im], [im.T])).T
results = a[np.triu_indices(len(x),1)]

but its actually slower than looping...

Your method could be rewritten as np.take(x, np.triu_indices(len(x), 1)).T. This is faster, but the method suggested by Ashwini Choudhary still edges it. — Alex Riley
– Alex Riley, Commented Dec 6, 2014 at 19:15

Ashwini Chaudhary · Accepted Answer · 2014-12-06 19:10:43Z

4

You can use itertools's functions like chain.from_iterable and combinations with np.fromiter for this. This involves no loop in Python, but still not a pure NumPy solution:

>>> from itertools import combinations, chain
>>> arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
>>> arr.reshape(arr.size/2, 2)
array([['a', 'b'],
       ['a', 'c'],
       ['a', 'd'],
       ..., 
       ['b', 'c'],
       ['b', 'd'],
       ['c', 'd']], 
      dtype='|S1')

Timing comparisons:

>>> x = np.array(["a", "b", "c", "d"]*100)
>>> %%timeit
    im = np.vstack([x]*len(x))
    a = np.vstack(([im], [im.T])).T
    results = a[np.triu_indices(len(x),1)]
... 
10 loops, best of 3: 29.2 ms per loop
>>> %%timeit
    arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
    arr.reshape(arr.size/2, 2)
... 
100 loops, best of 3: 6.63 ms per loop

edited Dec 6, 2014 at 19:10

answered Dec 6, 2014 at 19:00

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Veedrac Over a year ago

I get completely different timings to you (3ms vs 9ms). Are you sure your timings are correct (eg. did you paste the right setup)? I'm on Numpy 1.9.1.

Ashwini Chaudhary Over a year ago

@Veedrac Yes. Mine is 1.8.2.

Ashwini Chaudhary Over a year ago

I tested it on here as well and got 44 ms vs 10 ms. They are using 1.8.1.

Veedrac · Accepted Answer · 2014-12-07 14:30:55Z

I've been browsing the source and it seems the tri functions have had some very substantial improvements relatively recently. The file is all Python so you can just copy it into your directory if that helps.

I seem to have completely different timings to Ashwini Chaudhary, even after taking this into account.

It is very important to know the size of the arrays you want to do this on; if it is small you should cache intermediates like triu_indices.

The fastest code for me was:

def triangalize_1(x):
    xs, ys = numpy.triu_indices(len(x), 1)
    return numpy.array([x[xs], x[ys]]).T

unless the x array is small.

If x is small, caching was best:

triu_cache = {}
def triangalize_1(x):
    if len(x) in triu_cache:
        xs, ys = triu_cache[len(x)]

    else:
        xs, ys = numpy.triu_indices(len(x), 1)
        triu_cache[len(x)] = xs, ys

    return numpy.array([x[xs], x[ys]]).T

I wouldn't do this for large x because of memory requirements.

Collectives™ on Stack Overflow

efficiently compute ordering permutations in numpy array

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related