0

The following HTML does not have closing </dt> tags for each matching opening <dt> tag which is missing now.

For Example <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68924" ADD_DATE="1389093133">MSN Entertainment</A> (closing </dt> is missing here).

So, I decided to add the closing tag using regex. I am able to write the pattern for the finding a non-closed <dt> tag like

regEx pattern for finding the not closed <dt> tags:

<DT><A HREF=".*</A>

Regex code to replace what i find using the previous pattern with closing </dt> tag

<DT><A HREF=".*</A></DT>

But I got result as this string <DT><A HREF=".*</A></DT>, instead of just adding the closing <dt> tag I got this string everywhere I found the pattern.

Want to add the </dt> tag at the end of the pattern I found, either in IDE or via javascript is OK for me.

HTML file:

<DL>
    <DT><H3 ADD_DATE="1389093133" LAST_MODIFIED="1423897474">Links for United States</H3>
    <DL>
        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=129792" ADD_DATE="1389093133">GobiernoUSA.gov</A>
        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=129791" ADD_DATE="1389093133">USA.gov</A>
    </DL>
    <DT><H3 ADD_DATE="1389093133" LAST_MODIFIED="1423897474">MSN Websites</H3>
    <DL><p>
        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=55143" ADD_DATE="1389093133">MSN Autos</A>
        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68924" ADD_DATE="1389093133">MSN Entertainment</A>
        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68923" ADD_DATE="1389093133">MSN Money</A>
        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68921" ADD_DATE="1389093133">MSN Sports</A>
        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=54729" ADD_DATE="1389093133">MSN</A>
        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68922" ADD_DATE="1389093133">MSNBC News</A>
    </DL>

</DL>

2 Answers 2

2

One of the features of Regular Expression replacement strings is Backreferences, in which you reference to the part of the string in the source string.

For backreferencing you need to specify a part of the search string using parenthesis, then you can backreference using \n pattern in which n refer to the n-th group.

In the following example, we put all the search string in parenthesis to make it a group that we can backreference to.

(<DT>.*</A>$)

Replacement string:

\1</DT>
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks For the help, could you do the backreference in this jsfiddle jsfiddle.net/shmdhussain/dq3cs9d4/3 ...i am not able to do the backrefernce, could u help me?
@MohamedHussain your search string was set to be greedy. I removed the greediness, and now it works. In greedy mode, the engine tries to find the largest possible match, hence matching the very first asterisk mark to the very end asterisk mark.
I am struggling in using the backreference, could u use backreference in my fiddle... i tried removing the /g flag, still i got the entire maximum string
Take a look now @MohamedHussain. Just like this: var str = a.replace(/\1/,"birdeee"); Tha part \1 is the backreference.
check stribizhev answer there instead of "\1" for backreference , he used "$1".
1

Actually, to find all non-closed <DT> tags, you need to check if there is no closing </DT> tag, and it can be done with a negative look-ahead:

(<DT[^<]*><A HREF="[\s\S]*?<\/A>)(?!<\/DT>)

Replace with $1</DT>. Adding i option, we make sure we'll also capture <dt> tags.

[\s\S]*? will capture newlines, too. [^<]* in <DT[^<]*> will make sure we'll also capture <DT> tags with attributes.

See demo.

Sample code:

var re = /(<DT[^<]*><A HREF="[\s\S]*?<\/A>)(?!<\/DT>)/gi; 
var str = '<DL>\n    <DT><H3 ADD_DATE="1389093133" LAST_MODIFIED="1423897474">Links for United States</H3>\n    <DL>\n        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=129792" ADD_DATE="1389093133">GobiernoUSA.gov</A>\n        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=129791" ADD_DATE="1389093133">USA.gov</A>\n    </DL>\n    <DT><H3 ADD_DATE="1389093133" LAST_MODIFIED="1423897474">MSN Websites</H3>\n    <DL><p>\n        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=55143" ADD_DATE="1389093133">MSN Autos</A>\n        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68924" ADD_DATE="1389093133">MSN Entertainment</A>\n        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68923" ADD_DATE="1389093133">MSN Money</A>\n        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68921" ADD_DATE="1389093133">MSN Sports</A>\n        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=54729" ADD_DATE="1389093133">MSN</A>\n        <DT><A HREF="http://go.microsoft.com/fwlink/?LinkId=68922" ADD_DATE="1389093133">MSNBC News</A>\n    </DL>\n\n</DL>';
var subst = '$1</DT>'; 
var result = str.replace(re, subst);

8 Comments

How to give the html file as a input to check for this expression? I am converting the entire file into single string and am checking the expression. is there any better way to do it?
If it is malformed as in your case, I am not sure there is a better way.
thanks for the demo, coul you help me on this plunkr, when i try to do this on myself i got stuck, not able to use backreference jsfiddle.net/shmdhussain/dq3cs9d4/3
Please check my update (jsfiddle.net/dq3cs9d4/4), and if it is not what you expect, please let know what results you expect. I feel it is a bit different question.
Looks like this is what you need: jsfiddle.net/dq3cs9d4/8. I added a capturing group (.*?) and set the replacement string to $1.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.