4

I cant see what encoding Lua uses for its strings.

Im using

string.byte (s [, i [, j]])

which has the doc

Returns the internal numerical codes of the characters s[i], s[i+1], ···, s[j]. The default value for i is 1; the default value for j is i. Note that numerical codes are not necessarily portable across platforms.

Reading around people suggest it uses ASCII - which is fine for me - but I dont get the changing across platforms - I thought the very nature of using a single encoding (like ASCII) is that this wouldnt happen - or is it just saying this as ASCII does not define for over 126 (or 127) and therefore different countries / OEMS / OSs etc may be using custom ASCII extensions from decades ago for the upper range?

Its important for me to know that [a-zA-Z] will have the same char values on all platforms im running on.

The Lua doc could be a bit more specific here!

Any light anyone can shed on this would be great thx

1
  • 2
    "The Lua doc could be a bit more specific here!" No, it can't; It's portable by design. Each builder should provide such documentation. Commented Jul 26, 2013 at 14:04

1 Answer 1

6

I'm fairly sure you can safely assume an ASCII-derived encoding. So the minuscule set of characters you're interested in stays the same.

The note about the code changing between platforms likely means that Lua doesn't know anything about the character encoding at all and thus just uses whatever bytes the OS hands out. On Linux this is likely UTF-8, which means you'd have to deal with individual code units when stepping outside ASCII. On Windows I could imagine it being the system's legacy codepage, which means sort-of Latin 1 (CP 1252) in much of the Western world.

Sign up to request clarification or add additional context in comments.

3 Comments

+1 thx. Do you know if there any any encodings that would realistically be the defauly system encoding that have difference char codes for [a-zA-z]?
There is EBCDIC, but that's mostly a legacy on normal systems (but very much alive in the financial world). You're unlikely to encounter anything that's not ASCII-derived nowadays.
+1 "Lua doesn't know anything about the character encoding at all". It fundamentally depends on the libraries Lua is built on top of, which mostly defer to an OS mechanism as a default. So, it can be influenced or determined by rebuilding Lua with different toolsets or configuration, using a different OS, changing the OS settings, changing user settings in the OS, or changing a thread setting (outside of Lua).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.