1

I want to implement below logic in c++ using python.

struct hash_string ///
{
    hash_string() {}

    uint32_t operator ()(const std::string &text) const
    {
        //std::cout << text << std::endl;
        static const uint32_t primes[16] =
        {
            0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
            0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
            0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
            0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B,
        };

        //std::cout << text.size() << std::endl;
        uint32_t sum = 0;
        for (size_t i = 0; i != text.size(); i ++) {
            sum += primes[i & 15] * (unsigned char)text[i];
            //std::cout << text[i] <<std::endl;
            // std::cout << (unsigned char)text[i] << std::endl;
        }
        return sum;
    }
};

python version is like this, which is not completed yet, since I haven't found a way to convert text to unsigned char. So, please help!

# -*- coding: utf-8 -*-

text = u'连衣裙女韩范'

primes = [0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
                0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
                0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
                0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B]

//*text[i] does not work (of course), but how to mimic the logic above
rand = [primes[i & 15]***text[i]** for i in range(len(text))]

print rand

sum_agg = sum(rand)

print sum_agg

Take text=u'连衣裙女韩范' for example, c++ version returns 18 for text.size() and sum is 2422173716, while, in python, I don't know how to make it 18.

The equality of text size is essential, as a start at least.

11
  • I think Python already has built-in good-enough hashing of strings. Have you checked that? Commented Nov 5, 2015 at 9:14
  • You're probably looking for ord. Commented Nov 5, 2015 at 9:18
  • that's true, it is just that this logic is used universally in our application that I simply need to duplicate it to generate the same hash code. And my code is implemented in python all the way, so I just want to convert it to python for simplicity. Commented Nov 5, 2015 at 9:19
  • You can't encrypt text, only bytes. Encode the text first. Commented Nov 5, 2015 at 9:22
  • @SanderDeDycker I updated the post, ord doesnot solve it since text.size() in c++ returns 18, while using ord, I cannot get 18. Commented Nov 5, 2015 at 9:22

1 Answer 1

2

Because you are using unicode, for an exact reproduction you will need to turn text in a series of bytes (chars in c++).

bytes_ = text.encode("utf8") 
# when iterated over this will yield ints (in python 3)
# or single character strings in python 2

You should use more pythonic idioms for iterating over a pair of sequences

pairs = zip(bytes_, primes)

What if bytes_ is longer than primes? Use itertools.cycle

from itertools import cycle
pairs = zip(bytes_, cycle(primes))

All together:

from itertools import cycle

text = u'连衣裙女韩范'

primes = [0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
                0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
                0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
                0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B]

# if python 3
rand = [byte * prime for byte, prime in zip(text.encode("utf8"), cycle(primes))]
# else if python 2 (use ord to convert single character string to int)
rand = [ord(byte) * prime for byte, prime in zip(text.encode("utf8"), cycle(primes))]
hash_ = sum(rand)
Sign up to request clarification or add additional context in comments.

2 Comments

it is throwing error and I print the bytes_, it is printing endlessly! could you help debug it?
Sorry, the answer I posted was intended for python 3. I've made an edit to show what to do in python 2.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.