0

I'm interested in reordering the bits within a number, and since I want to do it several trillion times, I want to do it fast.

Here are the details: given a number num and an order matrix order. order contains up to ~6000 lines of permutations of the numbers 0..31. These are the positions to which the bits change.

Simplified example: binary(num) = 1001, order[1]=[0,1,3,2], reordered number for order[1] would be 1010 (binary).

Now I want to know, if my input number num is the smallest of these (~6000) reordered numbers. I'm searching for all 32-Bit numbers which fullfill this criterion. My current approach is to slow, so I'm looking for a speedup.

minimal-reproducible-example:

num = 1753251840
order = [[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3,  2,  1,  0,  7,  6,  5,  4, 11, 10,  9,  8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10,  9,  8, 7,  6,  5,  4,  3,  2,  1,  0],
[ 0,  1,  2,  3,  4,  5,  6,  7, 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26,  5,  4,  7,  6, 13, 12, 15, 14, 1,  0,  3,  2,  9,  8, 11, 10]]
  
patterns=set()
bits = format(num, '032b')
for perm in order:
    bitsn = [bits[perm[i]] for i in range(32)]
    patterns.add(int(''.join(bitsn),2))
print( min(patterns)==num)

Where can I start to improve this?

6
  • 5
    If you are looking to do anything "several trillion times", then your first action should be to use a compiled language. No matter how good an algorithm is, paying the price for dynamic dispatch several trillion times will wreck performance. Commented Sep 17, 2020 at 8:44
  • Check this to iterate over permutations one by one rather than generating it. But as @MisterMiyagi mentioned, several trillion times won't be fast in Python. Commented Sep 17, 2020 at 8:49
  • thanks, but I'm not interested in all permutations, just special ones, which I calculated by using itertools ;) Commented Sep 17, 2020 at 8:55
  • Is it expected that the min(patterns) give 65814 and not num (resulting in a wrong condition)? Commented Sep 17, 2020 at 9:00
  • 2
    Note that in order for this to be answerable, you should precisely define what improvement you are looking for. Just "how would you improve this?" is very broad and seems more appropriate for CodeReview – be sure to check their question guide first, though. If you are looking for performance improvements, please clearly define what you consider sufficient – likely not a few ms here or there, but some (how many?) orders of magnitude. Commented Sep 17, 2020 at 9:25

2 Answers 2

2

Extracting bits using string is generally very inefficient (whatever the language). The same thing also apply for parsing. Moreover, for such a fast low-level operation, you need to use a JIT or a compiled language as comments already pointed out.

Here is a prototype using the Numba's JIT (assume all numbers are unsigned):

npOrder = np.array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
    [ 3,  2,  1,  0,  7,  6,  5,  4, 11, 10,  9,  8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
    [15, 14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
    [31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10,  9,  8, 7,  6,  5,  4,  3,  2,  1,  0],
    [ 0,  1,  2,  3,  4,  5,  6,  7, 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
    [21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26,  5,  4,  7,  6, 13, 12, 15, 14, 1,  0,  3,  2,  9,  8, 11, 10]], dtype=np.uint32)

@njit
def extractBits(num):
    bits = np.empty(32, dtype=np.int32)
    for i in range(32):
        bits[i] = (num >> i) & 0x01
    return bits

@njit
def permuteAndMerge(bits, perm):
    bitsnFinal = 0
    for i in range(32):
        bitsnFinal |= bits[31-perm[i]] << i
    return bitsnFinal

@njit
def computeOptimized(num):
    bits = extractBits(num)
    permCount = npOrder.shape[0]
    patterns = np.empty(permCount, dtype=np.uint32)
    for i in range(permCount):
        patterns[i] = permuteAndMerge(bits, npOrder[i])
    # The array can be converted to a set if needed here with: set(patterns)
    return min(patterns) == num

This code is about 25 time faster than the original one on my machine (ran 5 000 000 times). You can also use Numba to accelerate and parallelize the loop that run the function computeOptimized resulting in a significant additional speed-up.

Note that this code can be again much faster in C or C++ using low-level processor instructions (available for example on many x86_64 processors). With that and parallelism, the order of magnitude of the execution speed should be close to a billion of permutation per second.

Sign up to request clarification or add additional context in comments.

2 Comments

thx. I have to look up the new input from this answer before responding.
nice. I have a speedup of about a factor 20 too. Thanks for pointing out those features, I will read more into it since I'm not familiar with python libraries/functionalities. Including C would be my last resort but I guess this improvement is sufficient.
1

Couple of possible speed-ups, staying with Python and the current algorithm:

  • Bail out as soon as you find a pattern less than num; once one like that is found, the condition cannot possibly be true. (You also don't need to store patterns; at most a flag whether an equal one was found, if that's not guaranteed by the problem.)

  • bitsn could be a generator expression, and doesn't need to be in a variable; you'll have to measure whether that's faster.

More fundamental improvements:

  • If you want to find all the numbers (rather than just test a particular one), it feels like there ought to be a faster algorithm by considering what the bits mean. A couple of hours thinking could potentially let you process just the 6000 lists, rather than all 2³² integers.

  • As others have written, if you're after pure speed, python is not the ideal language. That depends on the balance of how much time you want to spend on programming vs on running the program.

Side note:

  • Are the 32-bit integers signed or unsigned?

2 Comments

thx. I will look up generator expressions. I'm looking for unsigned integers.
Ah, generator expression just means omitting the [] from the [... for ... in ...] expression, like this: ''.join(bits[perm[i]] for i in range(32)). The main restriction is that you can only use the result of a generator expression once, while a list can be used multiple times.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.