2

How to sort utf-8 string on Google App Engine in Python? I am trying use local but I think it will not work and number of supported languages is too small.

I am trying to use pyuca but it to fat to use - reload 1MB each time to use only 1%% for sorting.

Is there some light weight pure python libraries or it is supported in Google App Engine in some way.

If you think that you have good algorithm it will pass this test (you can shuffle string for tests):

alphabet = u'AĄBCĆDEĘFGHIJKLŁMNŃOÓPRSŚTUWYZŹŻaąbcćdeęfghijklłmnńoóprsśtuwyzźż'

Any suggestion is welcome I will test it. This aplhabet is 'pl_pl'/'polish'.

1 Answer 1

4

Here's a pure-Python approach:

alphabet = u'AĄBCĆDEĘFGHIJKLŁMNŃOÓPRSŚTUWYZŹŻaąbcćdeęfghijklłmnńoóprsśtuwyzźż'
dsort = dict((let, i) for i, let in enumerate(alphabet))

def key_utf8(utf8_string):
  s = utf8_string.decode('utf8')
  return map(dsort.get, s)

some_list_of_utf8_strings.sort(key=key_utf8)

You'd probably be best advised to keep list of unicode strings internally -- decoding utf8 input at once and encoding back into utf8 on output if needed -- but as long as you're happy to potentially pay the decoding computational cost/delay repeatedly, this pure-Python approach should work fine, in App Engine or anywhere else.

If you do follow the best practice of only ever keeping unicode strings internally (decoding on input, encoding if needed on output), then the sort could also use a key=lambda s: map(dsort.get, s) -- but I'd personally prefer using a named function (for clarity) instead of the somewhat-goofy lambda. Just a matter of style, really.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.