1

How can I achieve the following using Regex/c# replace?

Input string I have : " test data : <test param="p" value="v"/> input string continues"

Output string I need : " test data : ((test param={p} value={v})) input string continues "

5
  • Are you needing help with the RegEx or the replace? Show us the code you have so far. Commented Oct 20, 2011 at 17:44
  • kinda skimpy on the details - what are some things that are close but shouldn't match? Commented Oct 20, 2011 at 17:46
  • all I have is the regular expressions to match the patterns. From what I know you can match a regex pattern (that I can do) and replace it with another string, but I need to replace it with another pattern, not sure how to do that or if it is even possible. Commented Oct 20, 2011 at 17:46
  • @CodeJockey, I updated the question to show what exactly needs to be matched and replaced. The string can have a lot of data, but the first pattern needs to be matched and replaced with the second pattern. There can be many occurrences of the first pattern in the string. Commented Oct 20, 2011 at 17:50
  • the syntax and options in a replacement expression aren't anywhere near as fully featured as for the match expression, but what you want to be done can certainly be done! If you have a successful match expression, it can probably be adapted for use in a find and replace, so please add it to the question. Commented Oct 20, 2011 at 18:03

1 Answer 1

1

I assume your real question is HOW to achieve it, and find/replace requirements seem to be fairly rigid, here is one that works for your case:

Find:      ([^:]*): <([^=]*)="([^"])"([^=]*)="([^"])"/>
Replace:   $1: (($2={$3}$4={$5}))

The Find expression can be broken down like this:

([^:]*)    # Capture zero or more characters that ARE NOT a colon
: <        # Match a colon, a literal space, then a less-than sign
([^=]*)    # Capture zero or more characters that ARE NOT an equals sign
="         # Match an equals sign, then a double quote
([^"])     # Capture zero or more characters that ARE NOT a double quote
"          # Match a double quote
([^=]*)    # Capture zero or more characters that ARE NOT an equals sign
="         # Match an equals sign, then a double quote
([^"])     # Capture zero or more characters that ARE NOT a double quote
"/>        # Match a double quote, a forward slash, then a greater-than sign

The parentheses () in the regex mean to "capture" any characters matched by the contents of the parentheses. Anything not in parentheses will be "matched", but will be discarded during a find and replace. This is useful for extracting bits of data from a pattern like extracting name, attribute1 and value1 from <name attribute1="value1"/> to then put into another pattern of text.

In C#, you use the System.Text.RegularExpressions.Regex object to match or to replace using regular expressions. I believe the syntax/signature (for an instance of the Regex object) is regexObject.Replace(input As String, replacement As String) As String

The replacement expression contains $1, $2, etc, which refer to the parts of the match enclosed in parantheses (). Thus, $1 (in the replacement expression) will insert the text matched by the first group ([^:]*) (from the match expression)

This combination will turn this text:

"test data : <test param="p" value="v"/>" more text. blah blah blah "test data : <test param="p" value="v"/>" and then some more text...

Into this text:

"test data : ((test param={p} value={v}))" more text. blah blah blah "test data : ((test param={p} value={v}))" and then some more text...

A great resource for learning about Regexes (and where I learned about 90% of what I know) is Regular-Expressions.info, and an associated tool called RegexBuddy is great for constructing, testing, and debugging regexes as well.

Sign up to request clarification or add additional context in comments.

5 Comments

it works correctly. Can you explain how that ([^:]*): in the Find and $1: part in the replace work?
I tried to explain about the $1 in the third paragraph (which starts The replacement expression contains...), and I'm not sure if that's not clear or if it's missing something. Without going pretty deep into regex internals, I'm not sure what else I can add :D or were you asking more about the match part ([^:]*):?
what is that colon doing next to the $1, is it same as equal to in regex? and yes, what does ([^:]*): signify. Sorry, my regex skills are pretty weak. If you are matching each piece separately, why do you need ([^:]*): in the beginning. That : is confusing me.
ah - I broke down the Match expression, so tell me if that's not clear enough, but in a replacement expression (at least in .Net), the only special character is the dollar sign $ - every other character is literal. I'll add a little more to the answer
ok, i see what you were doing. it should not look for test data : in the string. only highlighted portion should be matched. Thats why it was confusing me. let me read on regex a bit and i'll come back and accept the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.