5

I've got a numpy array. What is the fastest way to compute all the permutations of orderings.

What I mean is, given the first element in my array, I want a list of all the elements that sequentially follow it. Then given the second element, a list of all the elements that follow it.

So given my list: b, c, & d follow a. c & d follow b, and d follows c.

x = np.array(["a", "b", "c", "d"])

So a potential output looks like:

[
    ["a","b"],
    ["a","c"],
    ["a","d"],

    ["b","c"],
    ["b","d"],

    ["c","d"],
]

I will need to do this several million times so I am looking for an efficient solution.

I tried something like:

im = np.vstack([x]*len(x))
a = np.vstack(([im], [im.T])).T
results = a[np.triu_indices(len(x),1)]

but its actually slower than looping...

1
  • Your method could be rewritten as np.take(x, np.triu_indices(len(x), 1)).T. This is faster, but the method suggested by Ashwini Choudhary still edges it. Commented Dec 6, 2014 at 19:15

2 Answers 2

4

You can use itertools's functions like chain.from_iterable and combinations with np.fromiter for this. This involves no loop in Python, but still not a pure NumPy solution:

>>> from itertools import combinations, chain
>>> arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
>>> arr.reshape(arr.size/2, 2)
array([['a', 'b'],
       ['a', 'c'],
       ['a', 'd'],
       ..., 
       ['b', 'c'],
       ['b', 'd'],
       ['c', 'd']], 
      dtype='|S1')

Timing comparisons:

>>> x = np.array(["a", "b", "c", "d"]*100)
>>> %%timeit
    im = np.vstack([x]*len(x))
    a = np.vstack(([im], [im.T])).T
    results = a[np.triu_indices(len(x),1)]
... 
10 loops, best of 3: 29.2 ms per loop
>>> %%timeit
    arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
    arr.reshape(arr.size/2, 2)
... 
100 loops, best of 3: 6.63 ms per loop
Sign up to request clarification or add additional context in comments.

3 Comments

I get completely different timings to you (3ms vs 9ms). Are you sure your timings are correct (eg. did you paste the right setup)? I'm on Numpy 1.9.1.
@Veedrac Yes. Mine is 1.8.2.
I tested it on here as well and got 44 ms vs 10 ms. They are using 1.8.1.
3

I've been browsing the source and it seems the tri functions have had some very substantial improvements relatively recently. The file is all Python so you can just copy it into your directory if that helps.

I seem to have completely different timings to Ashwini Chaudhary, even after taking this into account.

It is very important to know the size of the arrays you want to do this on; if it is small you should cache intermediates like triu_indices.

The fastest code for me was:

def triangalize_1(x):
    xs, ys = numpy.triu_indices(len(x), 1)
    return numpy.array([x[xs], x[ys]]).T

unless the x array is small.

If x is small, caching was best:

triu_cache = {}
def triangalize_1(x):
    if len(x) in triu_cache:
        xs, ys = triu_cache[len(x)]

    else:
        xs, ys = numpy.triu_indices(len(x), 1)
        triu_cache[len(x)] = xs, ys

    return numpy.array([x[xs], x[ys]]).T

I wouldn't do this for large x because of memory requirements.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.