1

I have a c# code that will read an html file and return it content as string/text.

One thing that I need to do is parse the html string, look for all <embed> tags, get the value in the "src" attribute then replace the entire <embed> tag with the content of the file that is found in the src tag.

I am trying to use the HtmlAgilityPack to allow me to parse the html code.

The only thing that I am not able to do is how to replace the <embed> tag with another string and finally return the new string with no <embed> tag to the user.

Here is what I have done

    protected string ParseContent(string content)
    {
        if (content != null)
        {
            //Create a new document parser object
            HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();

            //load the content
            document.LoadHtml(content);

            //Get all embed tags
            IEnumerable<HtmlNode> embedNodes = document.DocumentNode.Descendants("embed");

            //Make sure the content contains at least one <embed> tag
            if (embedNodes.Count() > 0)
            {
                // Outputs the href for external links
                foreach (HtmlNode embedNode in embedNodes)
                {
                    //Mak sure there is a source
                    if (embedNode.Attributes.Contains("src"))
                    {
                        //If the file ends with ".html"
                        if (embedNode.Attributes["src"].Value.EndsWith(".html"))
                        {
                            var newContent = GetContent(embedNode.Attributes["src"].Value);

                            //Here I need to be able to replace the entireembedNode with the newContent
                        }

                    }
                }
            }

            return content;
        }

        return null;
    }

    protected string GetContent(string path)
    {

        if (System.IO.File.Exists(path))
        {
            //The file exists, read its content
            return System.IO.File.ReadAllText(path);
        }

        return null;
    }

How can I replace the <embed> tag with a string?

2 Answers 2

2

I figured it out. Thanks to @COlD TOLD he advised me to convert enumerable to list

Here is what I have done.

    protected string ParseContent(string content)
    {
        if (content != null)
        {
            //Create a new document parser object
            HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();

            //load the content
            document.LoadHtml(content);

            //Get all embed tags
            List<HtmlNode> embedNodes = document.DocumentNode.Descendants("embed").ToList();

            //Make sure the content contains at least one <embed> tag
            if (embedNodes.Count() > 0)
            {
                // Outputs the href for external links
                foreach (HtmlNode embedNode in embedNodes)
                {
                    //Mak sure there is a source
                    if (embedNode.Attributes.Contains("src"))
                    {

                        if (embedNode.Attributes["src"].Value.EndsWith(".html"))
                        {
                            //At this point we know that the source of the embed tag is set and it is an html file


                            //Get the full path
                            string embedPath = customBase + embedNode.Attributes["src"].Value;

                            //Get the 
                            string newContent = GetContent(embedPath);

                            if (newContent != null)
                            {
                                //Create place holder div node
                                HtmlNode newNode = document.CreateElement("div");

                                //At this point we know the file exists, load it's content
                                newNode.InnerHtml = HtmlDocument.HtmlEncode(newContent);

                                //Here I need to be able to replace the entireembedNode with the newContent
                                document.DocumentNode.InsertAfter(newNode, embedNode);

                                //Remove the code after converting it
                                embedNode.Remove();
                            }
                        }

                    }
                }

                return document.DocumentNode.OuterHtml;
            }

            return content;
        }

        return null;
    }
Sign up to request clarification or add additional context in comments.

1 Comment

May I know what is customBase ?? @Mike
2

I think you can try to get the parent node of the current node which is <embed> then replace the child node of the parent which is <embed>

var newContent = GetContent(embedNode.Attributes["src"].Value);
var ParentNodeT =embedNode.ParentNode;
var newNodeTtext = "<p>"+newContent+"</p>";
var newNodeT = HtmlNode.CreateNode(newNodeStr);
ParentNodeT.ReplaceChild(newNodeT, embedNode);

7 Comments

I think the issue with removing the node from the document. If you look at my last update you will see that the error happens because I remove an element from the list not from the HtmlAgilityPack documnt. I need to find a way to remove a node.
I thought you asked to replace
Yeah I am trying to replace. I thought I would do that by adding a div as a place holder and then remove the embed tag after reading its src value. However, I removed the line embedNode.Remove(); but still get an error with my loop for some reason
what if instead of using enumerable you use list
Well enumerable is kind of like a virtual object basically it not static in memory, while running your loop you replaced the content of the collection which on another iteration caused problem but with list it static so it not changing virtually on every iteration
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.