2

I am trying to replace a pattern in my string where only the words between the tags should be replaced. The word that needs to be replaced resides in a dictionary as key and value pair.

Currently this is what I am trying:

string input = "<a>hello</a> <b>hello world</b> <c>I like apple</c>";
string pattern = (@"(?<=>)(.)?[^<>]*(?=</)");
Regex match = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = match.Matches(input);

var dictionary1 = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
dictionary1.Add("hello", "Hi");
dictionary1.Add("world", "people");
dictionary1.Add("apple", "fruit");

string output = "";

output = match.Replace(input, replace => { return dictionary1.ContainsKey(replace.Value) ? dictionary1[replace.Value] : replace.Value; });
Console.WriteLine(output);
Console.ReadLine();

Using this, it does replace but only the first 'hello' and not the second one. I want to replace every occurrence of 'hello' between the tags.

Any help will be much appreciated.

4
  • I think your regex is matching the values between the tags, so the matches you're trying to replace are hello, hello world and I like apple. Are you trying to match on individual words? So your output should be <a>hi</a> <b>hi people</b> <c>I like fruit</c>? Commented Jun 1, 2017 at 15:33
  • 1
    Using regex on XML is generally considered a Bad Idea. Commented Jun 1, 2017 at 15:37
  • yes this is exactly how I want the output to be. Is my regex the problem here? Commented Jun 1, 2017 at 15:40
  • Yes, your regex isn't quite right, but I can't seem to come up with something that does what you want! My regex skills are somewhat rusty... Commented Jun 1, 2017 at 15:46

3 Answers 3

2

The problem is that the matches are:

  • hello
  • hello world
  • I like apple

so e.g. hello world is not in your dictionary.

Based on your code, this could be a solution:

using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {
        var dictionary1 = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
        dictionary1.Add("hello", "Hi");
        dictionary1.Add("world", "people");
        dictionary1.Add("apple", "fruit");


        string input = "<a>hello</a> <b>hello world</b> <c>I like apple</c>";
        string pattern = ("(?<=>)(.)?[^<>]list|" + GetKeyList(dictionary1) + "(?=</)");
        Regex match = new Regex(pattern, RegexOptions.IgnoreCase);
        MatchCollection matches = match.Matches(input);

        string output = "";

        output = match.Replace(input, replace => {
            Console.WriteLine(" - " + replace.Value);

            return dictionary1.ContainsKey(replace.Value) ? dictionary1[replace.Value] : replace.Value;
        });
        Console.WriteLine(output);
    }

    private static string GetKeyList(Dictionary<string, string> list)
    {
         return string.Join("|", new List<string>(list.Keys).ToArray());
    }
}

Fiddle: https://dotnetfiddle.net/zNkEDv

If someone wants to dig into this an tell me why do I need a "list|" in the list (because the first item is being ignored), I'll appreciate it.

Sign up to request clarification or add additional context in comments.

6 Comments

Here's a fiddle where I print these out: dotnetfiddle.net/kTP1i3
This certainly tells the OP what the problem is... Are you planning on providing a solution?
Will do @MikeMcCaughan
this works well! however, I don't clearly understand what you have done in to match the pattern?
@d.him basically I'm building a word list based on the dictionary. So the final expression would be "(?<=>)(.)?[^<>]list|hello|world|apple(?=</)" and itdl match those words inside the elements
|
1

This is another way of doing it - I parse the string into XML and then select elements containing the keys in your dictionary and then replace each element's value.
However, you have to have a valid XML document - your example lacks a root node.

    var xDocument = XDocument.Parse("<root><a>hello</a> <b>hello world</b> <c>I like apple</c></root>");
    var dictionary1 = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) { { "hello", "Hi" }, { "world", "people" }, { "apple", "fruit" } };

    string pattern = @"\w+";
    Regex match = new Regex(pattern, RegexOptions.IgnoreCase);

    var xElements = xDocument.Root.Descendants()
                      .Where(x => dictionary1.Keys.Any(s => x.Value.Contains(s)));

    foreach (var xElement in xElements)
    {
        var updated = match.Replace(xElement.Value, 
                           replace => {
                                return dictionary1.ContainsKey(replace.Value) 
                                   ? dictionary1[replace.Value] : replace.Value; });
        xElement.Value = updated;
    }
    string output = xDocument.ToString(SaveOptions.DisableFormatting);

This pattern of "\w+" matches words, not spaces.
This LINQ selects descendants of the root node where the element value contains any of the keys of your dictionary:

var xElements = xDocument.Root.Descendants().Where(x => dictionary1.Keys.Any(s => x.Value.Contains(s)));

I then iterate through the XElement enumerable collection returned and apply your replacement MatchEvaluator to just the string value, which is a lot easier!

The final output is <root><a>Hi</a><b>Hi people</b><c>I like fruit</c></root>. You could then remove the opening and closing <root> and </root> tags, but I don't know what your complete XML looks like.

Comments

1

This will do what you want (from what you have provided so far):

private static Dictionary<string, string> dict;
static void Main(string[] args)
{
  dict =
    new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
      {
        { "hello", "Hi" },
        { "world", "people" },
        { "apple", "fruit" }
      };

  var input = "<a>hello</a> <b>hello world</b> apple <c>I like apple</c> hello";
  var pattern = @"<.>([^<>]+)<\/.>";
  var output = Regex.Replace(input, pattern, Replacer);

  Console.WriteLine(output);
  Console.ReadLine();
}

static string Replacer(Match match)
{
  var value = match.Value;
  foreach (var kvp in dict)
  {
    if (value.Contains(kvp.Key)) value = value.Replace(kvp.Key, kvp.Value);
  }
  return value;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.