0

I am working on a plugin. I will parse HTML files. I have a naming convention like that:

<!--$include="a.html" -->

or

<!--$include="a.html"-->

is similar

According to this pattern(similar to server side includes) I want to search an HTML file. Question is that:

Find that pattern and get value (a.html at my example, it is variable)

It should be like:

while(!notFinishedWholeFile){
    fileName = findPatternFunc(htmlFile)
    replaceFunc(fileName,something)
}

PS: Using regex at Java or implementing it different(as like using .indexOf()) I don't know which one is better. If regex is good at this situation by performence I want to use it.

Any ideas?

4
  • Regular expressions don't perform replacement. They define search patterns. You have to do the replacing yourself. And of course once you've found what you want to replace you don't need another RE to define it. Not a real question. Commented Dec 30, 2012 at 19:28
  • @EJP I have added a pseudo code to my question. Commented Dec 30, 2012 at 19:38
  • You haven't add anything that changes the truth of my comment. You don't need two REs. Commented Dec 30, 2012 at 19:44
  • @EJB I have removed replacing part and improved question. Commented Dec 30, 2012 at 19:50

3 Answers 3

0

You mean like this?

<!--\$include=\"(?<htmlName>[a-z-_]*).html\"\s?-->
Sign up to request clarification or add additional context in comments.

3 Comments

Do you mean: String pattern ="<!--\\$include=\"([a-z-_]*).html\"\\s?-->";
@kamaci yes as the pattern. haven't tested it though. only with regex buddy for the two examples. you gave me.
it doesn't find what I find
0

Read a file into a string then

str = str.replaceAll("(?<=<!--\\$include=\")[^\"]+(?=\" ?-->)", something);

will replace the filenames with the string something, then the string can be written back to the file.
(Note: this replaces any text inside the double quotes, not just valid filenames.)

If you want only want to replace filenames with the html extension, swap the [^\"]+ for [^.]+.html.

Using regex for this task is fine performance wise, but see e.g. How to use regular expressions to parse HTML in Java? and Java Regex performance etc.

3 Comments

quote from your links: "Using regular expressions to pull values from HTML is always a mistake." and: "Hint: Don't use regexes for link extraction or other HTML "parsing" tasks!" :)
@linski. Yes, I included the links because I wanted kamaci to consider such opinions before making up his own mind.
I thought it might be more visible, now that I have red it again it seems more obvious.
0

I have used that pattern:

"<!--\\$include=\"(.+)(.)(html|htm)\"-->"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.