Permutations part two

Question

This is a follow up from this question. The goal is to make some function that can return the number of possible permutations with repetitions. As Gareth pointed out there is a nice formula for this problem:

$$ \frac{m!}{n_1!\,n_2! \ldots n_k!} $$

Where \$m\$ is the length of our permutations and \$n_i\$ is the number of times the \$i\$th letter appears in \$m\$. A simple example can be shown with the word Mississippi.

$$ P(\text{Mississippi}) = \frac{11!}{4!\,4!\,2!} $$

A simple implementation of this can be done in Python as follows:

from numpy import prod    
def count_unique_perms(obj):
    obj = str(obj)
    f = [factorial(i) for i in Counter(obj).values()]
    return factorial(len(obj))/prod(f)

I include this code as a basetest for the "improved" version below. Questions were raised whether this approach works for large factorials, and it was said that the running time was at worst \$O(n^3(log n)^2)\$

I used the ideas from my previous question. To try to improve the answer. First I find the number of factors in \$m!\$ then I subtract the factors from \$n_1!\,n_2! \ldots n_k!\$. A small optimization I did was avoiding calculating duplicates.

My question if this is a decent improvement. It seems much slower until really big string. For example, it is slightly faster for str(Mississippi)*500. Any other tips or algorithms are also welcome =)

from collections import Counter
from math import factorial, log
from numpy import prod
from primesieve import generate_primes


def count_unique_perms(obj):
    obj = str(obj)
    f = [factorial(i) for i in Counter(obj).values()]
    return factorial(len(obj))/prod(f)


def fast_unique_perms(obj):
    obj = str(obj)
    factors = factors_factorial(len(obj))

    obj_count = Counter([i for i in Counter(obj).values() if i > 1])
    for p in obj_count:
        count_p = factors_factorial(p, obj_count[p])
        for prime in count_p:
            factors[prime] -= count_p[prime]

    num_perms = 1
    for p in factors:
        num_perms *= p**factors[p]
    return num_perms


def factors_factorial(n, multiplicity=1):
    "Calculates all the facors of n!"
    count = Counter()
    for p in generate_primes(n+1):
        mul = 0
        for k in range(1, int(1+log(n)/log(p))):
            mul += int(n/p**k)
        count[p] = multiplicity*mul
    return count


if __name__ == '__main__':
    import timeit

    lst = 1123
    string = 'mississippi'
    print count_unique_perms(string)
    print fast_unique_perms(string)
    times = 100
    t1 = timeit.timeit("count_unique_perms('mississippi'*1000)",
                       setup="from __main__ import count_unique_perms", number=times)/float(times)
    t2 = timeit.timeit("fast_unique_perms('mississippi'*1000)",
                       setup="from __main__ import fast_unique_perms", number=times)/float(times)

    print "fast_unique_perms: ", 1000*t2, "ms"
    print "count_unique_perms:  ", 1000*t1, "ms"
    print "The count_unique_perms solution was:   ", t2/float(t1), "times faster than fast_unique_perms."

Gareth Rees · Accepted Answer · 2016-06-23 11:01:29Z

1. Review

There are a bunch of small problems here — I think that if you had given yourself some more time (or looked more carefully at the answers you were given to your previous question) then you would have been able to fix many of them yourself.

Missing docstrings — you've written one for factors_factorial but what do the other functions do?
Vague variable names — what kind of object is obj? What kinds of objects are counted by obj_count? The name i is normally used for indexes but this code uses it for characters, so c would be better. I would expect the name p to refer to a prime, but here it refers to a count or multiplicity, so n or m would be better.
For portability to Python 3 it's a good plan to stick to // for integer division and / for floating-point division. In Python 2 you can use from __future__ import division to enforce the Python 3 semantics. (Note that PEP 238 is from 2001 — fifteen years later you shouldn't be writing new code using the old semantics.)
In the expression:
```
Counter([i for i in Counter(obj).values() if i > 1])
```
there's no need to create a list which you are going to throw away again immediately — you can pass a generator expression directly to the Counter constructor.
Instead of looping over the keys of a mapping and then having a separate lookup to find the value, use the items method to loop over the keys and values simultaneously.

Instead of adjusting the values of factors one at a time:

count_p = factors_factorial(n, m)
for prime in count_p:
    factors[prime] -= count_p[prime]

make use of the subtraction operation on Counter objects:

factors -= factors_factorial(n, m)

Each time factors_factorial is called, it has to generate the primes again. It would be better to generate them just once.
Instead of log(n)/log(p), write log(n, p) and save a function call. See the math.log documentation.
This loop requires two calls to log, and computes a power on every iteration:
```
for k in range(1, int(1+log(n)/log(p))):
    mul += n // p**k
```
It's better to write the loop like this:
```
k = n
while k:
    k //= p
    mul += k
```
Writing it like this is not only simpler and avoids the power, but k gets smaller on each iteration, and so the divisions get cheaper as you go along.

2. Use NumPy

The code in the post imports NumPy in order to use numpy.prod. But if you're going to use NumPy at all, then there's no reason not to use it throughout the code:

import numpy as np

def factorial_factors_powers(primes, n):
    """Return an array containing the power of each prime in the
    factorization of n!.

    """
    p = primes
    n = np.full_like(p, n)
    result = r = np.zeros_like(p)
    while True:
        n //= p
        if n[-1] == 0:       # some primes not contributing any more?
            l = n.argmin()   # number of primes still contributing
            if l == 0:
                break
            n, r, p = n[:l], r[:l], p[:l]
        r += n
    return result

def fast_unique_perms(s):
    """Return the number of unique permutations of the string s."""
    s = str(s)
    primes = generate_primes(len(s) + 1)
    powers = factorial_factors_powers(primes, len(s))
    for n, m in Counter(Counter(s).values()).items():
        if n > 1:
            powers -= factorial_factors_powers(primes, n) * m
    return np.prod(np.array(primes, dtype=int) ** powers)

Note the use of dtype=int to force NumPy to do the final set of multiplications using Python's bignums instead of NumPy's fixed-width numbers.

This is around five times as fast as the code in the post.

Keep running into overflow issues when the input is over 66 ('Mississippi'*6) charachters. A suggestion was to use dtype='int64, however this did not solve the problem. do you have any solutions? — N3buchadnezzar
– N3buchadnezzar, Commented Jun 23, 2016 at 10:55

Stack Exchange Network

Permutations part two

1 Answer 1

1. Review

2. Use NumPy

You must log in to answer this question.

Linked

Hot Network Questions

Permutations part two

1 Answer 1

1. Review

2. Use NumPy

You must log in to answer this question.

Linked

Related

Hot Network Questions