Porting Python 2.7 code calling a C function to Python 3.4

Question

I am trying to call a C function from an .so file from Python 3.4. I have made some necessary changes to make the Python 2.7 code work with Python 3.4 but I am still running into a Fatal Python error: Segmentation fault.

The code is from this Bitbucket hosted project. I have installed it via pip3 (pip3 install Lemmagen), which also created the .so file I am trying to use from Python3.

Here is the original Python2.7 code (the function where the call to C code happens) which runs fine with python from the command line.

def lemmatize(self, word):
    if (self._output_buffer_len < 2 * len(word)):
        self._output_buffer_len = 2 * len(word)
        self._output_buffer = create_string_buffer(self._output_buffer_len)

    is_unicode = isinstance(word, unicode)
    if is_unicode:
        word = word.encode('utf-8')

    self._lib.lem_lemmatize_word(word, self._output_buffer)
    return self._output_buffer.value.decode('utf-8') if is_unicode else self._output_buffer.value

And this is how I am trying to adapt it to Python3.4:

def lemmatize(self, word):
    if (self._output_buffer_len < 2 * len(word)):
        self._output_buffer_len = 2 * len(word)
        self._output_buffer = create_string_buffer(self._output_buffer_len)

    word = word.encode('utf-8')


    self._lib.lem_lemmatize_word(word, self._output_buffer) #SEGFAULT HERE!
    #return "HERE"
    return self._output_buffer.value.decode('utf-8')

I have removed the lines that check whether word is unicode or not, since Unicode is default in Python3.x. I am still 80% sure that is a character encoding issue. What encoding do I have to use to pass on a string variable to the function call self._lib.lem_lemmatize_word(word, self._output_buffer)? That is the exact line where the segmentation fault occurs:

Fatal Python error: Segmentation fault

Current thread 0xb754b700 (most recent call first):
  File "/usr/local/lib/python3.4/dist-packages/lemmagen/lemmatizer.py", line 66 in lemmatize
  File "<stdin>", line 1 in <module>
Segmentation fault (core dumped)

I have been trying to read up on my exact question (encoding type), but nothing I have found so far seems to solve this. I would appreciate some thoughtful information on this. Thank you.

Thanks for whoever downvoted the question without a reason or any comment.

It might be worth adding the fact that you are using ctypes, so you are calling a C function rather than a C++ function. — Pim
– Pim, Commented Oct 7, 2015 at 9:46
@Pim @DrunkenMaster lem_lemmatize_word is actually defined as extern "C" in the sources, so calling it via ctypes shouldn't be a problem. — u354356007
– u354356007, Commented Oct 7, 2015 at 9:50
I still don't see why the segmentation fault happens, I tried passing hard-coded b'string' and u'string', too. They don't make a difference. — imrek
– imrek, Commented Oct 7, 2015 at 10:00
@Vovanrock2002 yeah but the question states that it is a c++ function, which it isent. — Pim
– Pim, Commented Oct 7, 2015 at 10:07

Pim · Accepted Answer · 2015-10-07 12:03:33Z

2

You need to use the create_string_buffer function to create a char array before passing it to the function.

Something like this should work:

    import ctypes

class Lib:
    def __init__(self):
        self.lib = ctypes.cdll.LoadLibrary('/home/pim/slovene_lemmatizer/bin/libLemmatizer.so')


def lemmatize(self, word):
    text = "text"
    output_buffer = ctypes.create_string_buffer(text.encode())

    word_buffer = ctypes.create_string_buffer(word.encode())

    self.lib.lem_lemmatize_word(word, output_buffer)

    print("test")

def main():
    lib = Lib()
    lib.lemmatize("test")


if __name__ == '__main__':
    main()

this outputs:

pim@pim-desktop:~/slovene_lemmatizer/bin$ python3 main.py [ERROR] Language file for lemmatizer has to be loaded first! test pim@pim-desktop:~/slovene_lemmatizer/bin$

Edit: I'm not 100% sure whether the usage of the 'raw' property here is correct though, but it works! Edit2: It does work without the raw property, updated the awnser

edited Oct 7, 2015 at 12:03

answered Oct 7, 2015 at 10:00

Pim

1741 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

imrek Over a year ago

I am not sure about this, this modification gives me an AttributeError: 'bytes' object has no attribute 'raw' for the last code line in your snippet.

imrek Over a year ago

Thanks, that was just a typo, now I have spotted it. But the function call still triggers a segmentation fault. :\

imrek Over a year ago

Do I have to set the argtypes explicitely, like in this question? stackoverflow.com/questions/27127413/…

imrek Over a year ago

Thanks for the updates, I try to integrate this with lemmatizer.py and se if this works for me.

imrek Over a year ago

I was able to get through to the C function with your new code, but of course as the output ([ERROR] Language file for lemmatizer has to be loaded first!) shows, I have to deal with a C code bug.

|

Collectives™ on Stack Overflow

Porting Python 2.7 code calling a C function to Python 3.4

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related