5

Given an Elixir bitstring encoded in UTF-16LE:

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>

how can I get this converted into a readable Elixir String (it spells out "Devastator")? The closest I've gotten is transforming the above into a list of the Unicode codepoints (["0044", "0065", ...]) and trying to prepend the \u escape sequence to them, but Elixir throws an error since it's an invalid sequence. I'm out of ideas.

2
  • You've already answered this question, don't you? Commented Sep 29, 2016 at 14:59
  • That was a temporary hack, and for more complex situations e.g. parsing a string of an unknown length that's terminated by a null byte, it was insufficient. Commented Sep 29, 2016 at 15:13

2 Answers 2

10

The simplest way is using functions from the :unicode module:

:unicode.characters_to_binary(utf16binary, {:utf16, :little})

For example

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>
|> :unicode.characters_to_binary({:utf16, :little})
|> IO.puts
#=> Devastator

(there's a null byte at the very end, so the binary display instead of string will be used in the shell, and depending on OS it may print some extra representation for the null byte)

Sign up to request clarification or add additional context in comments.

2 Comments

Ah, wow...I had actually looked around in the Erlang libraries, specifically binary to see if any of those methods would help me, but completely neglected to scroll down the page and see the Unicode one...thanks!
This is nice! I didn't know :unicode.characters_* functions also accepted binaries. @user701847 you should probably accept this answer instead of mine.
1

You can make use of Elixir's pattern matching, specifically <<codepoint::utf16-little>>:

defmodule Convert do
  def utf16le_to_utf8(binary), do: utf16le_to_utf8(binary, "")

  defp utf16le_to_utf8(<<codepoint::utf16-little, rest::binary>>, acc) do
    utf16le_to_utf8(rest, <<acc::binary, codepoint::utf8>>)
  end
  defp utf16le_to_utf8("", acc), do: acc
end

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>
|> Convert.utf16le_to_utf8
|> IO.puts

<<192, 3, 114, 0, 178, 0>>
|> Convert.utf16le_to_utf8
|> IO.puts

Output:

Devastator
πr²

1 Comment

Ah, that's what I was missing, thank you! I had never taken codepoint and then matched it like codepoint::utf8; I basically didn't know what to do with the 2 bytes. To make yours even simpler, we can just do: for << codepoint::utf16-little <- binary >>, into: "", do: <<codepoint::utf8>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.