I am trying to call a C function from an .so file from Python 3.4. I have made some necessary changes to make the Python 2.7 code work with Python 3.4 but I am still running into a Fatal Python error: Segmentation fault.
The code is from this Bitbucket hosted project. I have installed it via pip3 (pip3 install Lemmagen), which also created the .so file I am trying to use from Python3.
Here is the original Python2.7 code (the function where the call to C code happens) which runs fine with python from the command line.
def lemmatize(self, word):
if (self._output_buffer_len < 2 * len(word)):
self._output_buffer_len = 2 * len(word)
self._output_buffer = create_string_buffer(self._output_buffer_len)
is_unicode = isinstance(word, unicode)
if is_unicode:
word = word.encode('utf-8')
self._lib.lem_lemmatize_word(word, self._output_buffer)
return self._output_buffer.value.decode('utf-8') if is_unicode else self._output_buffer.value
And this is how I am trying to adapt it to Python3.4:
def lemmatize(self, word):
if (self._output_buffer_len < 2 * len(word)):
self._output_buffer_len = 2 * len(word)
self._output_buffer = create_string_buffer(self._output_buffer_len)
word = word.encode('utf-8')
self._lib.lem_lemmatize_word(word, self._output_buffer) #SEGFAULT HERE!
#return "HERE"
return self._output_buffer.value.decode('utf-8')
I have removed the lines that check whether word is unicode or not, since Unicode is default in Python3.x. I am still 80% sure that is a character encoding issue. What encoding do I have to use to pass on a string variable to the function call self._lib.lem_lemmatize_word(word, self._output_buffer)? That is the exact line where the segmentation fault occurs:
Fatal Python error: Segmentation fault
Current thread 0xb754b700 (most recent call first):
File "/usr/local/lib/python3.4/dist-packages/lemmagen/lemmatizer.py", line 66 in lemmatize
File "<stdin>", line 1 in <module>
Segmentation fault (core dumped)
I have been trying to read up on my exact question (encoding type), but nothing I have found so far seems to solve this. I would appreciate some thoughtful information on this. Thank you.
Thanks for whoever downvoted the question without a reason or any comment.
lem_lemmatize_wordis actually defined asextern "C"in the sources, so calling it via ctypes shouldn't be a problem.b'string'andu'string', too. They don't make a difference.