-1

I am trying to call a C function from an .so file from Python 3.4. I have made some necessary changes to make the Python 2.7 code work with Python 3.4 but I am still running into a Fatal Python error: Segmentation fault.

The code is from this Bitbucket hosted project. I have installed it via pip3 (pip3 install Lemmagen), which also created the .so file I am trying to use from Python3.

Here is the original Python2.7 code (the function where the call to C code happens) which runs fine with python from the command line.

def lemmatize(self, word):
    if (self._output_buffer_len < 2 * len(word)):
        self._output_buffer_len = 2 * len(word)
        self._output_buffer = create_string_buffer(self._output_buffer_len)

    is_unicode = isinstance(word, unicode)
    if is_unicode:
        word = word.encode('utf-8')

    self._lib.lem_lemmatize_word(word, self._output_buffer)
    return self._output_buffer.value.decode('utf-8') if is_unicode else self._output_buffer.value

And this is how I am trying to adapt it to Python3.4:

def lemmatize(self, word):
    if (self._output_buffer_len < 2 * len(word)):
        self._output_buffer_len = 2 * len(word)
        self._output_buffer = create_string_buffer(self._output_buffer_len)

    word = word.encode('utf-8')


    self._lib.lem_lemmatize_word(word, self._output_buffer) #SEGFAULT HERE!
    #return "HERE"
    return self._output_buffer.value.decode('utf-8')

I have removed the lines that check whether word is unicode or not, since Unicode is default in Python3.x. I am still 80% sure that is a character encoding issue. What encoding do I have to use to pass on a string variable to the function call self._lib.lem_lemmatize_word(word, self._output_buffer)? That is the exact line where the segmentation fault occurs:

Fatal Python error: Segmentation fault

Current thread 0xb754b700 (most recent call first):
  File "/usr/local/lib/python3.4/dist-packages/lemmagen/lemmatizer.py", line 66 in lemmatize
  File "<stdin>", line 1 in <module>
Segmentation fault (core dumped)

I have been trying to read up on my exact question (encoding type), but nothing I have found so far seems to solve this. I would appreciate some thoughtful information on this. Thank you.

Thanks for whoever downvoted the question without a reason or any comment.

7
  • It might be worth adding the fact that you are using ctypes, so you are calling a C function rather than a C++ function. Commented Oct 7, 2015 at 9:46
  • OK, I'll correct this. Commented Oct 7, 2015 at 9:49
  • @Pim @DrunkenMaster lem_lemmatize_word is actually defined as extern "C" in the sources, so calling it via ctypes shouldn't be a problem. Commented Oct 7, 2015 at 9:50
  • I still don't see why the segmentation fault happens, I tried passing hard-coded b'string' and u'string', too. They don't make a difference. Commented Oct 7, 2015 at 10:00
  • @Vovanrock2002 yeah but the question states that it is a c++ function, which it isent. Commented Oct 7, 2015 at 10:07

1 Answer 1

2

You need to use the create_string_buffer function to create a char array before passing it to the function.

Something like this should work:

    import ctypes

class Lib:
    def __init__(self):
        self.lib = ctypes.cdll.LoadLibrary('/home/pim/slovene_lemmatizer/bin/libLemmatizer.so')


def lemmatize(self, word):
    text = "text"
    output_buffer = ctypes.create_string_buffer(text.encode())

    word_buffer = ctypes.create_string_buffer(word.encode())

    self.lib.lem_lemmatize_word(word, output_buffer)

    print("test")

def main():
    lib = Lib()
    lib.lemmatize("test")


if __name__ == '__main__':
    main()

this outputs:

pim@pim-desktop:~/slovene_lemmatizer/bin$ python3 main.py [ERROR] Language file for lemmatizer has to be loaded first! test pim@pim-desktop:~/slovene_lemmatizer/bin$

Edit: I'm not 100% sure whether the usage of the 'raw' property here is correct though, but it works! Edit2: It does work without the raw property, updated the awnser

Sign up to request clarification or add additional context in comments.

6 Comments

I am not sure about this, this modification gives me an AttributeError: 'bytes' object has no attribute 'raw' for the last code line in your snippet.
Thanks, that was just a typo, now I have spotted it. But the function call still triggers a segmentation fault. :\
Do I have to set the argtypes explicitely, like in this question? stackoverflow.com/questions/27127413/…
Thanks for the updates, I try to integrate this with lemmatizer.py and se if this works for me.
I was able to get through to the C function with your new code, but of course as the output ([ERROR] Language file for lemmatizer has to be loaded first!) shows, I have to deal with a C code bug.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.