1

I need a regex to use in string.gmatch that matches sequences of alphanumeric characters and non alphanumeric characters (quotes, brackets, colons and the like) as separated, single, matches, so basically:

str = [[
    function test(arg1, arg2) {
        dosomething(0x12f, "String");
    }
]]

for token in str:gmatch(regex) do
    print(token)
end

Should print:

function
test
(
arg1
,
arg2
)
{
dosomething
(
0x121f
,
"
String
"
)
;
}

How can I achieve this? In standard regex I've found that ([a-zA-Z0-9]+)|([\{\}\(\)\";,]) works for me but I'm not sure on how to translate this to Lua's regex.

2 Answers 2

1
local str = [[
    function test(arg1, arg2) {
        dosomething(0x12f, "String");
    }
]]

for p, w in str:gmatch"(%p?)(%w*)" do
   if p ~= "" then print(p) end
   if w ~= "" then print(w) end
end
Sign up to request clarification or add additional context in comments.

6 Comments

What does %p match?
%w - alphanumeric, %p - punctuation (punctuation is all ASCII7 except alphanumeric, space and control characters)
This is really useful - I could for example add all the tokens to a table and then process them - but how could I preserve in a unique token strings delimited by the character "?
@user6245072 - It would be more complex program. It should correctly handle comments, escaped quotes inside string literals, etc.
@user6245072, if handling of quoted fragments is needed, then something like this may be used: stackoverflow.com/questions/28664139/…
|
1

You need a workaround involving a temporary char that is not used in your code. E.g., use a § to insert it after the alphanumeric and non-alphanumeric characters:

str = str:gsub("%s*(%w+)%s*", "%1§") -- Trim chunks of 1+ alphanumeric characters and add a temp char after them
str = str:gsub("(%W)%s*", "%1§")     -- Right trim the non-alphanumeric char one by one and add the temp char after each
for token in str:gmatch("[^§]+") do  -- Match chunks of chars other than the temp char
    print(token)
end

See this Lua demo

Note that %w in Lua is an equivalent of JS [a-zA-Z0-9], as it does not match an underscore, _.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.