I have ported Stempel stemmer in Java (Apache Lucene) to Python. I come from Java world, so I'm afraid my translation might not be "pythonic" enough.
I would like to hear your feedback on quite representative part of the code, translation of Diff class that applies stemming command (diff) to a string (dest).
@classmethod
def apply(cls, dest, diff):
"""
Apply the given patch string diff to the given string dest
:param dest: Destination string
:param diff: Patch string
:return:
"""
if diff is None:
return
if not isinstance(dest, MutableString):
raise ValueError
if not dest:
return
pos = len(dest) - 1
try:
for i in range(int(len(diff) / 2)):
cmd = diff[2 * i]
param = diff[2 * i + 1]
par_num = ord(param) - ord('a') + 1
if cmd == '-':
pos -= (par_num - 1)
elif cmd == 'R':
cls.__check_index(dest, pos)
dest[pos] = param
elif cmd == 'D':
o = pos
pos -= (par_num - 1)
cls.__check_index(dest, pos)
dest[pos:o + 1] = ''
elif cmd == 'I':
pos += 1
cls.__check_offset(dest, pos)
dest.insert(pos, param)
pos -= 1
except IndexError:
# swallow, same thing happens in original Java version
pass
@classmethod
def __check_index(cls, s, index):
if index < 0 or index >= len(s):
raise IndexError
@classmethod
def __check_offset(cls, s, offset):
if offset < 0 or offset > len(s):
raise IndexError
Some justifications of decisions I took:
The original implementation uses
StringBufferto manipulate characters in a string. Since Pythonstrtype is immutable I used my own classMutableStringthat behaves like a Pythonlist.Also, original logic was based on catching
IndexOutOfBoundsException. Contrary to Java, Python allows negative indexes in a list and list ranges. Therefore, I've introduces guards like__check_X.Java implementation uses switch/case/default clause. I translated that to if/elif/else clause in Python.