MemoryError on my Python code, how to optimize?

Question

I got a challenge on CodeEval, which is:

Sequence 011212201220200112 ... constructed as follows: first is 0, then repeated the following action: already written part is attributed to the right with replacement 0 to 1, 1 to 2, 2 to 0. E.g.

0 -> 01 -> 0112 -> 01121220 -> ...

Create an algorithm which determines what number is on the N-th position in the sequence.

INPUT SAMPLE:

OUTPUT SAMPLE:

CONSTRAINTS:

0 <= N <= 3000000000

I already wrote the code, but from the constraint, N can be a very large number, and my code got MemoryError on numstring += temp

Here's my code:

from string import maketrans

def predict(n):
    n = int(n)
    instring = '012'
    outstring = '120'
    tab = maketrans(instring, outstring)
    numstring = '0'
    temp = numstring
    while len(numstring) <= n:
        temp = temp.translate(tab)
        numstring += temp
        temp = numstring
    print numstring[n]

Is there any way to optimize this?

Note: The Input and Output is in the type of string, not integer

Yeah, I'm so stupid I even store the entire string in memory xD — possibility0
– possibility0, Commented Jun 15, 2014 at 10:51

Tom Zych · Accepted Answer · 2014-06-15 09:59:26Z

1

Think about it. Is it actually necessary to store everything?

You start with one character; this generates two, which generates four, etc. Each character in a given substring is a function of one, and only one, character in the preceding substring. You don't need to keep track of any of the other characters; only the ones that affect your answer.

answered Jun 15, 2014 at 9:59

Tom Zych

13.7k9 gold badges38 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bakuriu · Accepted Answer · 2014-06-15 11:45:06Z

This kind of problems is easily solved thinking recursively. First of all you have to get rid of the thought that you are dealing with strings and string operations. The translations that you have to do are 0 -> 1, 1 -> 2 and 2 -> 0. This means that you are simply adding 1 to all digits modulo 3 (i.e. add 1, divide by 3 and take the remainder).

Recursively how would you solve this problem? Well, if the N requested is 0 you are in the base case of the beginning of the string, so the answer is 0. If the request is for a bigger index what should we do? As Tom pointed out if you split the sequence in two halves every character in the second half is a function of one character in the first half. If we can compute the index of such a character we can solve the problem recursively.

The index of such a character is easily computed. The sequence itself is of infinite length, but, given the input N you can always consider the prefix of length 2^(ceil(log2(N))) and you'll always have a situation like:

a b c ... z | A B C ... Z
         middle   ^
                  N

i.e. where N is on the second half.

If from N we substract the length of the first half we obtain the index of the wanted character. The length of half the string would be 2^(floor(log2(N))):

   a b c ... z A B C ... Z      ~~>      A B C ... Z
              ^    ^                         ^
             2^x   N                        N - 2^x

So we have to recursively solve the problem for N - 2^x (with x = floor(log2(N))) and then we have to apply the transformation to get the result relative to the fact that we were in the second half of the string.

In other words, the solution is:

def find_digit(n):
    if n == 0:
        return 0
    # bit_length() gives floor(log2(N))
    length = n.bit_length()
    return (1 + find_digit(n - 2 ** (length-1))) % 3

And in fact:

In [22]: find_digit(0)
Out[22]: 0

In [23]: find_digit(5)
Out[23]: 2

In [24]: find_digit(101)
Out[24]: 1

In [25]: find_digit(25684)
Out[25]: 0

In [26]: find_digit(3000000000)
Out[26]: 0

In [27]: find_digit(3000000001)
Out[27]: 1

In [28]: find_digit(3000000002)
Out[28]: 1

In [29]: find_digit(30000000021265672555626541643155343185826435812641)
Out[29]: 2


In [30]: find_digit(30000000021265672555626541643155343185826435812641**10)
Out[30]: 1

Note that this solution is both memory and computationally efficient, doing only log(n) operations. However it is still limited to 1000 recursive calls.

From the recursive solution it's pretty easy to obtain a non-recursive solution which will work for huge inputs:

def find_digit(n):
    start = 0
    while n:
        start += 1
        length = n.bit_length()
        n -= 2 ** (length - 1)
    return start % 3

If you want to output a string just change the return line to return str(start % 3). If you want to receive a string input just add n = int(n) at the top of the function.

Nicolas78 · Accepted Answer · 2014-06-15 09:53:59Z

0

If you consider how large the string can get, I would assume the point of this exercise is exactly to not store the whole string in memory.

Also, you might want to try replacing

numstring += temp
temp = numstring

by

temp = "%s%s" % (numstring, temp)

This may be slightly more memory-efficient in the short run, even though it won't really solve the underlying algorithmic issue.

answered Jun 15, 2014 at 9:53

Nicolas78

5,1441 gold badge26 silver badges43 bronze badges

Collectives™ on Stack Overflow

MemoryError on my Python code, how to optimize?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related