1

I read contents from a rte and before submitting them to a server i need to replace < and > with their html entities inside some title attributes. I do not want to use Dom-Operations here because the text representation is all i got here. What i am looking for is a regex that transforms this

<div>ABCD<img style="max-height: 25px; max-width: 25px;" class="inlinetag" 
  src="http://my_images/icon.gif" 
  title="<ir_inline itemname=bild_1 type=0><cbd>"> EFG</div>

into this

<div>ABCD<img style="max-height: 25px; max-width: 25px;" class="inlinetag" 
  src="http://my_images/icon.gif" 
  title="&lt;inline itemname=bild_1 type=0&gt;&lt;cbd&gt;"> EFG</div>

How can this be done?

5
  • Why do you need to replace it? It's perfectly valid. Commented Jan 28, 2013 at 9:07
  • Script injections reasons I would guess Commented Jan 28, 2013 at 9:16
  • it does not matter why i want to replace those characters, but it should be enough to say that even though it has been valid html there will be problems later on when the attribute tag gets evaluated in another software. I have a given cms structure and some coding-stuff changed after a new firefox has been released. Commented Jan 28, 2013 at 9:56
  • You are asking for server side solution, but what are you having on the server side? Commented Jan 28, 2013 at 11:09
  • no, i am asking for a client-side solution using a regex Commented Jan 28, 2013 at 11:22

4 Answers 4

1

Pure regex solution:

var input = "title=\"<ir_inline itemname=bild_1 type=0><cbd>\""; //use the entire input
var myRegexp = /title=\"(.*?)\"/g; // get all title attributes
var output = input.replace(myRegexp,function(a){return a.replace(/</g,"&lt;").replace(/>/g,"&gt;");});

I've tested it with your sample input and output. It should work.

Sign up to request clarification or add additional context in comments.

3 Comments

well, this replaces the content of the match, but not the whole text
@Thariama Try assigning the whole text to input and see.
+1 thanks, this seems to work; i guess it is not possible to avoit using the additional function?
1

so let me assume couple of things, you have a plain text with html tags and attributes and you want to treat it as a plain text only, probably coz you're getting this text on the server side.

Other than Regex, if you prefer string manipulation through loops then below is the simple loop(logic), through which you can achieve what you want.

I've assumed you need to do it server side, so I've used C# for this purpose, you can use any language, even javascript for that reason to perform this loop.

 string sourceText = "<div id=\"target\" ><div>ABCD<img style=\"max-height: 25px; max-width: 25px;\" class=\"inlinetag\" " +
                            "src=\"http://my_images/icon.gif\\" +
                            "title=\"<ir_inline itemname=bild_1 type=0><cbd>\"> EFG</div>" +
                        "</div>";
 string targetText = sourceText;
 bool traceOn = false;
 for (int i = 0; i < targetText.Length; i++)
 {
    if (traceOn)
       if (targetText[i] == '"')
            traceOn = false;

   if (traceOn)
   {
        if (targetText[i] == '<')
        {
             targetText = targetText.Remove(i, 1).Insert(i, "&lt;");
        }
        if (targetText[i] == '>')
        {
            targetText = targetText.Remove(i, 1).Insert(i, "&gt;");
        }
   }
   if (targetText[i] == '"')
   {
        if (targetText[i - 1] == '=')
          traceOn = true;
   }
}
        }

so basically what I am doing is manipulating the pattern for your replacements i.e. you need to replace only those < and > which occur inside a double quote and that also preceded by an '='. It works perfectly.

It is not a perfect solution, but then it should give you and Idea, how you can process down your string. somebody here can write even more powerful and flexible logic. try/imporve it out.

Other solution can be, to treat entire string of yours like an xml. i.e. almost all server side languages provide tools to process a string as an xml. Find the one suiting your need i.e.

I could have done something like

XmlDocument doc = new Xmldocument();
doc.LoadXml(targetString);

and then I could easily retrieve any tag and its attribute.

as for regex, I am soo frightened of them. It should give you an idea.

1 Comment

+1 thx for your effort, but i am looking for a client-side solution using a regex
1

Try this JS function:

function title_replace() {
    var str = '<div>ABCD<img style="max-height: 25px; max-width: 25px;" class="inlinetag" \
  src="http://my_images/icon.gif" \
  title="<ir_inline itemname=bild_1 type=0><cbd>"> EFG</div>';
    var re = new RegExp(/title="(.|[\s\S])+?"/gm);
    var title = "";
    while (title_matches = re.exec(str)) {
        title = title_matches[0];
        var new_title = title.replace(/</gm,"&lt;").replace(/>/gm,"&gt;");
        str = str.replace(title, new_title);
    }
}

Edit:

I've removed all work on dom, its all JS now. See if this works for you.

3 Comments

i do not want to write the html text into a new dom element (i want to avoid browser functionality as much as possible)
@Thariama I've changed the function so it works only with JS and doesn't access any DOM. Give it a try.
+1 thanks for another solution, iaccepted the other one because that one is more compact
0

Try antisamy at the server side. It's powerful and safe.

1 Comment

i am looking for a serverside solution using a regex

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.