Is it possible to convert Erlang binary UTF-8 string (like <<"HELLO">>) to lowercase without converting it to list and back?
-
This library: github.com/erlang-unicode/ux.Nycholas Oliveira Oliveira– Nycholas Oliveira Oliveira2012-09-08 19:30:58 +00:00Commented Sep 8, 2012 at 19:30
Add a comment
|
2 Answers
If you know how to lowercase unicode character and key words here are "without converting it to list and back", then the answer could be:
<< <<(unicode_to_lower(C))/utf8>> || <<C/utf8>> <= <<"HELLO">> >>.
4 Comments
Adam Lindberg
@Kay: Having a working implementation of
unicode_to_lower/1 is implied by the answer.Ivan Dubrov
I knew I am missing something really simple! Thanks!
Boris Mühmer
Note: This will only work on a very small subset (the ASCII range)! For some "values" You have to peek at the following bytes (I think up to 6 bytes). en.wikipedia.org/wiki/UTF-8
Victor Moroz
@bsmr It will work not just for ASCII
1> [ C || <<C/utf8>> <= unicode:characters_to_binary("ПРИВЕТ") ]. [1055,1056,1048,1042,1045,1058] string:lowercase in Erlang 20 works with binaries:
1> string:lowercase(<<"HELLO">>).
<<"hello">>