How to replace inner string Regex

Question

I have following code snippet, The search criteria would be find all img tag that has id="someImage" value

<img id="someImage" src="C:\logo.png" height="64" width="104" alt="myImage" />

I want to replace

src="C:\logo.png" to src="someothervalue"

so the final output would be

<img id="someImage" src="C:\someothervalue" height="64" width="104" alt="myImage" />

How can i achieve this using regex.

Thank you.

Elian Ebbing · Accepted Answer · 2011-06-17 08:05:49Z

1

You can work with groups in a regex. You create groups by using parentheses in your regular expression. When you get a Match object, this object will contain a Group collection:

string input = "<html><img id=\"someImage\" src=\"C:\\logo.png\" height=\"64\" width=\"104\" alt=\"myImage\" /></html>";
var regex = new Regex("(<img(.+?)id=\"someImage\"(.+?))src=\"([^\"]+)\"");

string output = regex.Replace(
    input, 
    match => match.Groups[1].Value + "src=\"someothervalue\""
);

In the example above there will be 5 groups:

Groups[0] This is the whole match: <img id=\"someImage\" src=\"C:\\logo.png\"
Groups[1] This is everything before the src attribute: <img id=\"someImage\"
Groups[2] and Groups[3] are the (.+?) parts.
Groups[4] is the original value of the src attribute: C:\logo.png

In the example I replace the whole match for the value of Groups[1] and a new src attribute.

Footnote: While regular expressions can sometimes be adequate for the job to manipulate an html document, it is often not the best way. If you know in advance that you are working with xhtml, then you can use XmlDocument + XPath. If it is html, then you can use HtmlAgilityPack.

edited Jun 17, 2011 at 8:05

answered Jun 17, 2011 at 7:59

Elian Ebbing

19.1k5 gold badges50 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

41K Over a year ago

Thanks for mentioning HtmlAgilityPack and for the answer, that worked for me.

Petar Ivanov · Accepted Answer · 2011-06-17 06:57:13Z

1

It is not a good idea to use regex for XML. Depending on the language you should use some XML reader, extract the <img> node and then get its id. One useful language for querying XML data, which is supported by many XML libraries is XPath.

In C# you can look at XmlDocument class (and related classes).

Another one is XmlReader.

The latter offers only sequential access, while the first one loads the whole tree in memory, so the first one is easier to use (especially if your XML content is not too big).

edited Jun 17, 2011 at 6:57

answered Jun 17, 2011 at 6:50

Petar Ivanov

93.4k11 gold badges84 silver badges95 bronze badges

1 Comment

Petar Ivanov Over a year ago

Really? It looks like HTML to me. HTML is a special kind of XML.

Collectives™ on Stack Overflow

How to replace inner string Regex

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related