1

I have a lot of badly formatted HTML which I am trying to fix using Lua for example

<p class='heading'>my useful information</p>
<p class='body'>lots more text</p>

which I want to replace with

<h2>my useful information</h2>
<p class='body'>lots more text</p>

What I am trying to use is the following Lua function which is passed the whole html page. How ever I have two problems, I want the gsub to pass the replace function the whole match including the top and tail and I will then replace the top and tails and return the string. The other problem is my inner replace function can't see the top and tail fields.

Sorry if this is an obvious one, but I am still learning Lua.

function topandtailreplace(str,top,tail,newtop,newtail)
local strsearch = top..'(.*)'..tail
     function replace(str)
            str = string.gsub(str,top,newtop)
            str = string.gsub(str,tail,newtail)
            return str
    end
    local newstr = str:gsub(strsearch,replace())
    return newstr
end

2 Answers 2

3

This seems to work:

s=[[
<p class='heading'>my useful information</p>
<p class='body'>lots more text</p>
]]

s=s:gsub("<p class='heading'>(.-)</p>","<h2>%1</h2>")
print(s)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks very much, I had gone off trying to make it harder than it was, I completely missed the %1 option to retain the middle bit.
0

You could use a HTML parsing library with a DOM tree, for example lua-gumbo:

luarocks install gumbo

The following example would do what you want:

local gumbo = require "gumbo"

local input = [[
    <p class='heading'>my useful information</p>
    <p class='body'>lots more text</p>
]]

local document = assert(gumbo.parse(input))
local headings = assert(document:getElementsByClassName("heading"))
local heading1 = assert(headings[1])
local textnode = assert(heading1.childNodes[1])
local new_h2 = assert(document:createElement("h2"))

heading1.parentNode:insertBefore(new_h2, heading1)
new_h2:appendChild(textnode)
heading1:remove()

io.write(document:serialize(), "\n")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.