2

It’s well known pysha3 isn’t compatible with pypy, and because it’s unmaintained for 3 years, I have to modify it myself.

Of course, a proper way would be to perform a complete rewrite in pure python code (which would also results in a faster implementation over the current one), but I lack the required knowledge both in cryptograhy and background math to do this, and the program using it is very very list intensive (which requires a python3 without a gil for multithreading or python3 with a jit).

The single point of failure boils down to this function which has to be called by C code:

static PyObject*
_Py_strhex(const char* argbuf, const Py_ssize_t arglen)
{
    static const char *hexdigits = "0123456789abcdef";

    PyObject *retval;
#if PY_MAJOR_VERSION >= 3
    Py_UCS1 *retbuf;
#else
    char *retbuf;
#endif
    Py_ssize_t i, j;

    assert(arglen >= 0);
    if (arglen > PY_SSIZE_T_MAX / 2)
        return PyErr_NoMemory();

#if PY_MAJOR_VERSION >= 3
    retval = PyUnicode_New(arglen * 2, 127);
    if (!retval)
            return NULL;
    retbuf = PyUnicode_1BYTE_DATA(retval);
#else
    retval = PyString_FromStringAndSize(NULL, arglen * 2);
    if (!retval)
            return NULL;
    retbuf = PyString_AsString(retval);
    if (!retbuf) {
            Py_DECREF(retval);
            return NULL;
    }
#endif
    /* make hex version of string, taken from shamodule.c */
    for (i=j=0; i < arglen; i++) {
        unsigned char c;
        c = (argbuf[i] >> 4) & 0xf;
        retbuf[j++] = hexdigits[c];
        c = argbuf[i] & 0xf;
        retbuf[j++] = hexdigits[c];
    }

    return retval;
}

cython compatibility level is at 3.2 for pypy and PyUnicode_New was introduced in python3.3.

I tried the hammer way to fix it with replacing the whole file with the following cython code:

cdef Py_strhex(const char* argbuf, const Py_ssize_t arglen):
    return (argbuf[:arglen]).hex()

but it seems it triggers a segmentation fault including compiling and using the official Python implementation. And using the official PyPy binary, I don’t have the debugging symbols for gdb so I don’t know why.

(gdb) bt
#0  0x00007ffff564cd00 in pypy_g_text_w__pypy_interpreter_baseobjspace_W_Root () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#1  0x00007ffff5d721a8 in pypy_g_getattr () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#2  0x00007ffff543a8bd in pypy_g_dispatcher_15 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#3  0x00007ffff5ab909b in pypy_g_wrapper_second_level.star_2_14 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#4  0x00007fffd7212372 in _Py_strhex.2738 () from /usr/lib64/pypy3.6-v7.2.0-linux64/site-packages/pysha3-1.0.3.dev1-py3.6-linux-x86_64.egg/_pysha3.pypy3-72-x86_64-linux-gnu.so
#5  0x00007fffd7217990 in _sha3_sha3_224_hexdigest_impl.2958 () from /usr/lib64/pypy3.6-v7.2.0-linux64/site-packages/pysha3-1.0.3.dev1-py3.6-linux-x86_64.egg/_pysha3.pypy3-72-x86_64-linux-gnu.so
#6  0x00007ffff5be2170 in pypy_g_generic_cpy_call__StdObjSpaceConst_funcPtr_SomeI_5 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#7  0x00007ffff54b25cd in pypy_g.call_1 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#8  0x00007ffff56715b9 in pypy_g_BuiltinCodePassThroughArguments1_funcrun_obj () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#9  0x00007ffff56ffc06 in pypy_g_call_valuestack__AccessDirect_None () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so
#10 0x00007ffff5edb29b in pypy_g_CALL_METHOD__AccessDirect_star_1 () from /usr/lib64/pypy3.6-v7.2.0-linux64/bin/libpypy3-c.so

Increasing the default Linux stack depth to 65Mb doesn’t change the depth of recursion where the segfault happens so even if the stack depth is larger than 200, this doesn’t seems to be related to a stack overflow.

11
  • Any chance of updating your codebase to Python 3.6 or higher? Commented Oct 28, 2019 at 0:58
  • @Selcuk PyPy support python3.6, but only at the python level. At the C level, it’s still at level 3.2 of compatibility. Even the latest version of lack the functions to run pysha3. Commented Oct 28, 2019 at 1:01
  • Sorry, I was talking about using the built-in hashlib.sha3 that comes with Python 3.6. Commented Oct 28, 2019 at 1:02
  • 1
    @Selcuk it’s for doing the keccak variant of sha3 in order to be compatible with Ethereum so it’s unfortunately not compatible. The project itself doesn’t use pysha3: it’s used by many many pip dependencies so fixing this issue would be simpler. Commented Oct 28, 2019 at 1:05
  • I don't know why you say at C level it's still at level 3.2. Maybe it misses a specific API function, but this is a bug that we'd fix if you report it. Commented Oct 28, 2019 at 6:04

2 Answers 2

1

In terms of the Cython, it's simpler than you think:

cdef Py_strhex(const char* argbuf, const Py_ssize_t arglen):
    return (argbuf[:arglen]).hex()

Essentially you don't need to malloc (which was introducing a memory leak anyway because it was missing a free) and you don't need the memcpy. argbuf[:arglen] creates a bytes object with the appropriate length (making a copy of the data).

This definitely works on CPython. On PyPy2 it produces AttributeError: 'str' object has no attribute 'hex', which is correct for Python 2. I'd imagine if it were to produce a segmentation fault it would happen before the AttributeError so that's promising. I don't have PyPy3 readily available...


Edit:

I've now managed to test my code on PyPy3 like follows:

# extra Cython code just to call the function
def test():
    cdef const char* a = "0123456789"
    return Py_strhex(a,10)

Then from Python:

import modulename
modulename.test()

This works fine without a segmentation fault; therefore I'm pretty convinced this code is fine.

I do not know how you're calling the Cython code since you do not say; however Cython does not generate C code with the intention that you just copy an individual function. It generates a module and the module expects to be imported (some stuff is set up during the module import). Specifically Cython sets up a table of strings during module initialization including the string "hex" used to look up the attribute. To correctly use this code you'd need to ensure the module it's contained in is imported first rather than just dump a copy of the generate Cython code in a C file. Doing this is a bit complicated in Python 3 and probably doesn't suit your purposes.

I'll leave this answer in it's current state since I believe it's correct and the issues are occurring in the parts you don't specify. It's quite likely it isn't useful to you and you're free to ignore it.

Sign up to request clarification or add additional context in comments.

12 Comments

Wouldn’t this fails if argbuf is passed to the stack (because it would later be attempted to freed by the garbage collector) ? Also of course, I’m using lastest version of PyPy3. But as you can see in the backtrace, the Segfault happens because of the attempt to call the interpter for finding the function to be executed.
Strings own their own memory. This creates a temporary string with a copy of argbuf (which can be safely outlive argbuf, but in this case doesn't need to). The hex string derived from it is also separate and owns its own memory. From a memory management point of view it's fine as long as argbuf is valid the moment the function is called. I'm afraid this is only an attempt to answer the "how to write a Cython function to convert const char* to hex" part of the question. I'm really not capable to diagnosing PyPy internals issues!
As shown by the backtrace sounds like the real issue if about doing in full C without calling python getattr (which also means without using cython). I mean doing it in full C compatible with Cpython 3.2 (something I’ve no idea on how to acheive).
If the Cython code still produces a seg-fault then I don't think I can help you; I'll delete my answer fairly shortly
No it might still be possible to edit it. What I can do is to convert in hex using C strings then convert into a python string. Which boils down to this question : how to convert a C string into a PyObject string ? Beside, are you sure (argbuf[:arglen]) returns a bytearray and not a str object ?
|
0

Ok found what I was looking for using this variant. This won’t work on all compilers and is compatible only with Python3 but it brings partial PyPy compatibility (some tests which are supposed to fails succeeds because an Incorrect hash is returned) with pysha3 alongs the programs it depends on:

static PyObject * _Py_strhex(const char* argbuf, const Py_ssize_t arglen) {
    static const char *hexdigits = "0123456789abcdef";

    assert(arglen >= 0);

    if (arglen > PY_SSIZE_T_MAX / 2)
        return PyErr_NoMemory();

    const Py_ssize_t len=arglen*2;
    char retbuf[len+1];
    retbuf[len+1]=0;

    /* make hex version of string, taken from shamodule.c */
    for (Py_ssize_t i=0,j=0; i < arglen; i++) {
        retbuf[j++] = hexdigits[(argbuf[i] >> 4) & 0xf];
        retbuf[j++] = hexdigits[argbuf[i] & 0xf];
    }

    return PyUnicode_FromStringAndSize(retbuf,len);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.