2

Background Info: I have a large body of text that I regularly encapsulate in a single string from an XML document(using LINQ). This string contains lots of HTML that I need to preserve for output purposes, but the emails and discrete HTML links that occasionally occur in this string need to be removed. An Example of the offending text looks like this:

--<a href="mailto:[email protected]" target="_blank">John Smith</a> from <a href="http://www.agenericwebsite.com" target="_blank">Romanesque Architecture</a></p>

What I need to be able to do is:

  1. Find the following string: <a href
  2. Delete that string and all characters following it through the string >
  3. Also, always delete this string </a>

Is there a way with LINQ that I can do this easily or am I going to have to create an algorithm using .NET string manipulation to achieve this?

2
  • 1
    Why do you want to use LINQ? This looks like regex/string manipulation would be much simpler Commented Nov 14, 2011 at 17:40
  • +1 @AustinSalonen The only answer for any question regarding processing html! Html and regex is an accident waiting to happen. And I like regex :) Commented Nov 14, 2011 at 17:59

2 Answers 2

2

You could probably do this with LINQ, but it sounds like a regular old REGEX would be much, much better.

It sounds like this question, and particularly this answer demonstrate what you're trying to do.

Sign up to request clarification or add additional context in comments.

4 Comments

Ah, Regex. I was afraid so. Unfortunately, I haven't ever used it, but now is a good time to learn. Now, I understand that Regex helps identify substrings and patterns within a string, but if I apply the techniques in the link you provided, how am I going to get around that the ending delimiter for most of my emails and HTML links is &gt;, which appears frequently in other places in my text? Thanks for the help by the way.
@full - not sure I understand. Can't you use the technique from the answer to search for strings starting with <a and ending with &gt; ? Not sure I understand.
I probably can. My response was based on a limited knowledge of the capabilities of Regex. Do you or anyone have a favorite source for reading up on it?
@full - no, actually my regex knowledge is fairly limited. I know this is the perfect situation for a regex, but I'm not sure what the details of implementing it would be. Just use the links above to get you started, make a good attempt, then ask a new question when you get stuck :)
1

If you want to do this exactly via LinqToXml, try something like this recursive function:

    static void ReplaceNodesWithContent(XElement element, string targetElementname)
    {
        if (element.Name == targetElementname)
        {
            element.ReplaceWith(element.Value);
            return;
        }

        foreach (var child in element.Elements())
        {
            ReplaceNodesWithContent(child, targetElementname);
        }
    }

Usage example:

    static void Main(string[] args)
    {
        string xml = @"<root>
<items>
    <item>
        <a>inner</a>
    </item>
    <item>
        <subitem>
            <a>another one</a>
        </subitem>
    </item>
</items>

";

        XElement x = XElement.Parse(xml);

        ReplaceNodesWithContent(x, "a");

        string res = x.ToString();
        //            res == @"<root>
        //                      <items>
        //                        <item>inner</item>
        //                        <item>
        //                          <subitem>another one</subitem>
        //                        </item>
        //                      </items>
        //                    </root>"
    }

1 Comment

Yeah, I can definitely see where you are going with this. Thanks for the input, but Ill probably be taking this opportunity to learn Regex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.