2

I have a string, which represents part of xml.

string text ="word foo<tag foo='a' />another word "

and I need to replace particular words in this string. So I used this code:

Regex regex = new Regex("\\b" + co + "\\b", RegexOptions.IgnoreCase);
return regex.Replace(text, new MatchEvaluator(subZvyrazniStr));
static string     subZvyrazniStr(Match m)
    {
        return "<FtxFraze>" + m.ToString() + "</FtxFraze>";
    }

But the problem of my code is, that it also replaces string inside tags, which i don't want to. So what should I add, to replace words only outside tags?

Ex.: when I set variable co to "foo" I want to return "word <FtxFraze>foo</FtxFraze><tag foo='a' />another word"

Thanks

2
  • You shouldn't try to parse or modify XML with Regexes if the XML structure is relevant. See this. Use an XML parser instead. Then you can apply the Regex code to text nodes only. Commented Sep 23, 2012 at 16:52
  • I know but In this case, I have a lot of nodes and I don't know the exact structure, so I think it's faster and easier to do this with regex Commented Sep 23, 2012 at 17:40

3 Answers 3

5

A simple trick like this may suffice in some cases if you are not that picky:

\bfoo\b(?![^<>]*>)
Sign up to request clarification or add additional context in comments.

4 Comments

the [^<>] should be [^<]..no need of >
Can you please explain this regex?
@Anirudha, the > helps the regex engine find the match faster, otherwise it needs to backtrack. (Altho that depends on the engine and how optimized it is.)
@david, (?![^<>]*>) is a negative lookahead, it fails the match if the word is followed by a >, without a < between, thus suggesting that the word is inside an open tag.
1

This is what you want

(?<!\<[\w\s]*?)\bfoo\b(?![\w\s]*?>)

works here

I had answered a related question here

Comments

0

Try this regex:

Regex r = new Regex(@"\b" + rep + @".*?(?=\<)\b", RegexOptions.IgnoreCase);

1 Comment

This matches "foo <tag>bar" in "foo <tag>bar</tag>" and "football" in "<tag>football</tag>". The reluctant quantifier, .*?, is too weak for this job; you need to actively exclude the < like @Qtax did. And you have to do that inside the lookahead, so you only consume the word foo.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.