0
HtmlHelper.GetTagsAndValues(htmlContent);

and i get this error:

 at System.String.Split(String[] separator, Int32 count, StringSplitOptions options)
   at System.String.Split(String[] separator, StringSplitOptions options)
   at WebCrawler.Logic.CrawlerManager.UseRulesOnHtmlPage(Agencies agency, String pageUrl, List`1 listTagValuePair, RulesGroups ruleGroup) in D:\PROJEKTI\crawler\WebCrawlerSuite\WebCrawler.Logic\CrawlerManager.cs:line 263
   at WebCrawler.Logic.CrawlerManager.GetAdvertismentFromHtmlContent(List`1 listTagValuePair, Agencies agency, String pageUrl) in D:\PROJEKTI\crawler\WebCrawlerSuite\WebCrawler.Logic\CrawlerManager.cs:line 191
   at WebCrawler.Logic.CrawlerManager.ImportAdvertisment2Database.Work(Crawler crawler, PropertyBag propertyBag) in D:\PROJEKTI\crawler\WebCrawlerSuite\WebCrawler.Logic\CrawlerManager.cs:line 668
   at WebCrawler.Logic.CrawlerManager.ImportAdvertisment2Database.Process(Crawler crawler, PropertyBag propertyBag) in D:\PROJEKTI\crawler\WebCrawlerSuite\WebCrawler.Logic\CrawlerManager.cs:line 584

i read this article:

https://learn.microsoft.com/en-us/archive/blogs/ericlippert/out-of-memory-does-not-refer-to-physical-memory

How can i prevent this error?

whole method:

public static List<TagValuePair> GetTagsAndValues(string htmlContent)
        {
            List<TagValuePair> tagsValues = new List<TagValuePair>();
            Dictionary<string, int> tagAppearance = new Dictionary<string, int>();

            HtmlDocument doc = new HtmlDocument();

            if (htmlContent != null)
            {
                doc.LoadHtml(htmlContent);

                if (doc.DocumentNode.SelectNodes("//*") == null)
                {
                    List<TagValuePair> tempList = new List<TagValuePair>();
                    tempList.Add(new TagValuePair("Error!", htmlContent, -1));
                    return tempList;
                }
                
                foreach (HtmlNode tag in doc.DocumentNode.SelectNodes("//*"))
                {
                    try
                    {
                        if (!string.IsNullOrEmpty(tag.InnerHtml.Trim()))
                        {
                            if (!tagAppearance.Keys.Contains(tag.Name))
                            {
                                tagAppearance.Add(tag.Name, 1);
                            }
                            else
                                tagAppearance[tag.Name] = tagAppearance[tag.Name] + 1;

                            tagsValues.Add(new TagValuePair(tag.Name, tag.InnerHtml.Trim(), tagAppearance[tag.Name]));
                        }
                        else
                        {
                            // Help link: http://refactoringaspnet.blogspot.com/2010/04/using-htmlagilitypack-to-get-and-post_19.html
                            if (!string.IsNullOrEmpty(tag.GetAttributeValue("value", "").Trim()))
                            {
                                if (!tagAppearance.Keys.Contains("option value"))
                                {
                                    tagAppearance.Add("option value", 1);
                                }
                                else
                                    tagAppearance["option value"] = tagAppearance["option value"] + 1;

                                tagsValues.Add(new TagValuePair("option value", tag.GetAttributeValue("value", "").Trim(), tagAppearance["option value"]));
                            }

                            if (tag.NextSibling != null && !string.IsNullOrEmpty(tag.NextSibling.InnerHtml.Trim()))
                            {
                                if (!tagAppearance.Keys.Contains(tag.Name))
                                {
                                    tagAppearance.Add(tag.Name, 1);
                                }
                                else
                                    tagAppearance[tag.Name] = tagAppearance[tag.Name] + 1;

                                tagsValues.Add(new TagValuePair(tag.Name, tag.NextSibling.InnerHtml.Trim(), tagAppearance[tag.Name]));
                            }
                        }
                    }
                    catch (Exception)
                    {
                        return null;
                    }
                }
            }

EDIT:

exact error is here:

 doc.LoadHtml(htmlContent);
7
  • 1
    You should reorganize your code. a "global" System.Exception Catcher isn'T a good idea. When removing the catch you could see an which exact position the exception is thrown. see also blogs.msdn.com/b/kcwalina/archive/2007/01/30/… Commented Jul 22, 2011 at 5:29
  • thx. so which type of errors do you suggest. problem is that i get this error after 12 hours of app working. Commented Jul 22, 2011 at 5:35
  • how often is this code running? Commented Jul 22, 2011 at 5:36
  • depends how fast read html. less then 20s Commented Jul 22, 2011 at 5:40
  • You could run this program until it gets the exception and then connect with visual studio debugger to see where the exception occured. Also you can monitor if the memory is full. Perhaps you got some memory leaks ? Commented Jul 22, 2011 at 5:45

1 Answer 1

3

I would suggest looking at a memory profiler to ensure you haven't got any leaks in your application. Given you say it occurs after 12 hours of app working, it seems to indicate that it may be a slow leak that eventually causes the OutOfMemory exception.

There are a number of ways that you can unitentionally hold onto references that will cause a slow leak. Running a profiler will help you identify these issues. It may not be the one line of code that is causing the problem. It may just be that the one line of code is often showing you the straw that breaks the camels back.

I have used Redgates Ants Profiler before (it comes with a 14 day free trial), and it helped me heaps to get memory usage down and increase performance. I seem to be plugging this a lot recently, but it is purely due to the fact I find it to be a very valuable tool.

Take a look through some of their walkthroughs and/or vidoes to see how to track down a leak.

Sign up to request clarification or add additional context in comments.

2 Comments

+1 for the Ants profiler. Also, specifically take notice of the large object heap stats. That was the main culprit of OOMs in the process I've been working on.
I got stung by not unsubscribing to a particular forms events. When the form was closed, it was never cleaned up, so the forms (and everything they referenced) were left hanging around unable to be collected. One small change and it made a major difference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.