2

Using an existing C example algorithm, I want to generate the correct CRC32 hash for a string in python. However, I am receiving incorrect results. I mask the result of every operation and attempt to copy the original algorithm's logic. The C code was provided by the same website which has a webpage string hash checking tool, so it is likely to be correct.

Below is a complete Python file including C code in its comments which it attempts to mimic. All pertinent information is in the file.

P_32 = 0xEDB88320
init = 0xffffffff
_ran = True
tab32 = []

def mask32(n):
    return n & 0xffffffff

def mask8(n):
    return n & 0x000000ff

def mask1(n):
    return n & 0x00000001

def init32():
    for i in range(256):
        crc = mask32(i)
        for j in range(8):
            if (mask1(crc) == 1):
                crc = mask32(mask32(crc >> 1) ^ P_32)
            else:
                crc = mask32(crc >> 1)
        tab32.append(crc)
    global _ran
    _ran = False

def update32(crc, char):
    char = mask8(char)
    t = crc ^ char
    crc = mask32(mask32(crc >> 8) ^ tab32[mask8(t)])
    return crc

def run(string):
    if _ran:
        init32()
    crc = init
    for c in string:
        crc = update32(crc, ord(c))
    print(hex(crc)[2:].upper())

check0 = "The CRC32 of this string is 4A1C449B"
check1 = "123456789" # CBF43926
run(check0) # Produces B5E3BB64
run(check1) # Produces 340BC6D9

# Check CRC-32 on http://www.lammertbies.nl/comm/info/crc-calculation.html#intr

"""
/* http://www.lammertbies.nl/download/lib_crc.zip */

#define                 P_32        0xEDB88320L
static int              crc_tab32_init          = FALSE;
static unsigned long    crc_tab32[256];

    /*******************************************************************\
    *                                                                   *
    *   unsigned long update_crc_32( unsigned long crc, char c );       *
    *                                                                   *
    *   The function update_crc_32 calculates a  new  CRC-32  value     *
    *   based  on  the  previous value of the CRC and the next byte     *
    *   of the data to be checked.                                      *
    *                                                                   *
    \*******************************************************************/

unsigned long update_crc_32( unsigned long crc, char c ) {

    unsigned long tmp, long_c;

    long_c = 0x000000ffL & (unsigned long) c;

    if ( ! crc_tab32_init ) init_crc32_tab();

    tmp = crc ^ long_c;
    crc = (crc >> 8) ^ crc_tab32[ tmp & 0xff ];

    return crc;

}  /* update_crc_32 */

    /*******************************************************************\
    *                                                                   *
    *   static void init_crc32_tab( void );                             *
    *                                                                   *
    *   The function init_crc32_tab() is used  to  fill  the  array     *
    *   for calculation of the CRC-32 with values.                      *
    *                                                                   *
    \*******************************************************************/

static void init_crc32_tab( void ) {

    int i, j;
    unsigned long crc;

    for (i=0; i<256; i++) {

        crc = (unsigned long) i;

        for (j=0; j<8; j++) {

            if ( crc & 0x00000001L ) crc = ( crc >> 1 ) ^ P_32;
            else                     crc =   crc >> 1;
        }

        crc_tab32[i] = crc;
    }

    crc_tab32_init = TRUE;

}  /* init_crc32_tab */
"""
5
  • 2
    from zipfile import crc32 is not good for you? Commented Jan 14, 2016 at 21:20
  • Well, congratulations; you replicated that C code accurately, because the C code produces exactly the same output. Commented Jan 14, 2016 at 21:32
  • @YoavGlazner I cannot seem to find good documentation on that. Commented Jan 14, 2016 at 21:49
  • @TomZych I would comment on the source website, but I can't seem to find the 'create new post' or 'reply' functions anywhere, even though it says that I have access to do so. I wouldn't be able to put a link to here because their website rules say no links are allowed in posts or comments. Commented Jan 14, 2016 at 21:49
  • Do you mean you want to post that the C code gives answers that don’t match? Just compile it yourself, then you can post that you found it yourself. I suspect it might be an endianness issue. Commented Jan 14, 2016 at 21:51

1 Answer 1

2

There's just one thing that's wrong with the current implementation and the fix is actually just one line of code to the end of your run function which is:

crc = crc ^ init

Which if added to your run function look like this:

def run(string):
    if _ran:
        init32()
    crc = init
    for c in string:
        crc = update32(crc, ord(c))
    crc = crc ^ init    
    print(hex(crc)[2:].upper())

This will give you the correct results you are expecting.The reason that this is necessary is after you are done updating the CRC32, the finalization of it is XORing it with the 0xFFFFFFFF. Since you only had the init table and update functions and not the finalize, you were one step off from the actual crc.

Another C implimentation that is a little more straightforward is this one it's a little bit easier to see the whole process. The only thing slightly obsure is the init poly ~0x0 is the same (0xFFFFFFFF).

Sign up to request clarification or add additional context in comments.

1 Comment

Curious. How much faster is the C implementation? It seemed like my python approach would take weeks, yet I think someone said they found a self-hash using C in a couple minutes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.