9

I have a table like this

table = {57,55,0,15,-25,139,130,-23,173,148,-24,136,158}

it is utf8 encoded byte array by php unpack function

unpack('C*',$str);

how can I convert it to utf-8 string I can read in lua?

2
  • What do those numbers represent? Are they suppose to be utf-8 encoded code points, actual numeric literals you want converted or what? Commented Sep 9, 2013 at 8:25
  • it is utf8 encoded byte array by php unpack function Commented Sep 9, 2013 at 8:26

2 Answers 2

6

Lua doesn't provide a direct function for turning a table of utf-8 bytes in numeric form into a utf-8 string literal. But it's easy enough to write something for this with the help of string.char:

function utf8_from(t)
  local bytearr = {}
  for _, v in ipairs(t) do
    local utf8byte = v < 0 and (0xff + v + 1) or v
    table.insert(bytearr, string.char(utf8byte))
  end
  return table.concat(bytearr)
end

Note that none of lua's standard functions or provided string facilities are utf-8 aware. If you try to print utf-8 encoded string returned from the above function you'll just see some funky symbols. If you need more extensive utf-8 support you'll want to check out some of the libraries mention from the lua wiki.

Sign up to request clarification or add additional context in comments.

1 Comment

-1: does not handle 3- and 4-byte UTF8 characters like U+20AC -> €
4

Here's a comprehensive solution that works for the UTF-8 character set restricted by RFC 3629:

do
  local bytemarkers = { {0x7FF,192}, {0xFFFF,224}, {0x1FFFFF,240} }
  function utf8(decimal)
    if decimal<128 then return string.char(decimal) end
    local charbytes = {}
    for bytes,vals in ipairs(bytemarkers) do
      if decimal<=vals[1] then
        for b=bytes+1,2,-1 do
          local mod = decimal%64
          decimal = (decimal-mod)/64
          charbytes[b] = string.char(128+mod)
        end
        charbytes[1] = string.char(vals[2]+decimal)
        break
      end
    end
    return table.concat(charbytes)
  end
end

function utf8frompoints(...)
  local chars,arg={},{...}
  for i,n in ipairs(arg) do chars[i]=utf8(arg[i]) end
  return table.concat(chars)
end

print(utf8frompoints(72, 233, 108, 108, 246, 32, 8364, 8212))
--> Héllö €—

3 Comments

I've just replaced the old implementation with one that is far more elegant (uses no strings for the binary math), shorter, and consequently about 5 times faster, too.
Additional optimizations (edited into the above) provide another 2x or more perf gains.
How to use this func with string like so s="\xD0\x9C\xD0\xBE\xD1\x81\xD0\xBA\xD0\xB2\xD0\xB0"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.