Converting a UTF-16LE Elixir bitstring into an Elixir String

Question

Given an Elixir bitstring encoded in UTF-16LE:

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>

how can I get this converted into a readable Elixir String (it spells out "Devastator")? The closest I've gotten is transforming the above into a list of the Unicode codepoints (["0044", "0065", ...]) and trying to prepend the \u escape sequence to them, but Elixir throws an error since it's an invalid sequence. I'm out of ideas.

That was a temporary hack, and for more complex situations e.g. parsing a string of an unknown length that's terminated by a null byte, it was insufficient. — user701847
– user701847, Commented Sep 29, 2016 at 15:13

michalmuskala · Accepted Answer · 2016-09-29 15:01:31Z

10

The simplest way is using functions from the :unicode module:

:unicode.characters_to_binary(utf16binary, {:utf16, :little})

For example

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>
|> :unicode.characters_to_binary({:utf16, :little})
|> IO.puts
#=> Devastator

(there's a null byte at the very end, so the binary display instead of string will be used in the shell, and depending on OS it may print some extra representation for the null byte)

answered Sep 29, 2016 at 15:01

michalmuskala

11.4k2 gold badges39 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user701847 Over a year ago

Ah, wow...I had actually looked around in the Erlang libraries, specifically binary to see if any of those methods would help me, but completely neglected to scroll down the page and see the Unicode one...thanks!

Dogbert Over a year ago

This is nice! I didn't know :unicode.characters_* functions also accepted binaries. @user701847 you should probably accept this answer instead of mine.

Dogbert · Accepted Answer · 2016-09-29 14:50:22Z

1

You can make use of Elixir's pattern matching, specifically <<codepoint::utf16-little>>:

defmodule Convert do
  def utf16le_to_utf8(binary), do: utf16le_to_utf8(binary, "")

  defp utf16le_to_utf8(<<codepoint::utf16-little, rest::binary>>, acc) do
    utf16le_to_utf8(rest, <<acc::binary, codepoint::utf8>>)
  end
  defp utf16le_to_utf8("", acc), do: acc
end

<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0, 0, 0>>
|> Convert.utf16le_to_utf8
|> IO.puts

<<192, 3, 114, 0, 178, 0>>
|> Convert.utf16le_to_utf8
|> IO.puts

Output:

Devastator
πr²

answered Sep 29, 2016 at 14:50

Dogbert

224k43 gold badges419 silver badges416 bronze badges

1 Comment

user701847 Over a year ago

Ah, that's what I was missing, thank you! I had never taken codepoint and then matched it like codepoint::utf8; I basically didn't know what to do with the 2 bytes. To make yours even simpler, we can just do: for << codepoint::utf16-little <- binary >>, into: "", do: <<codepoint::utf8>

Collectives™ on Stack Overflow

Converting a UTF-16LE Elixir bitstring into an Elixir String

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related