13

I'm new to both XML and C#; I'm trying to find a way to efficiently parse a given xml file to retrieve relevant numerical values, base on the "proj_title" value=heat_run or any other possible values. For example, calculating the duration of a particular test run (proj_end val-proj_start val).

ex.xml:

<proj ID="2">
      <proj_title>heat_run</proj_title>
      <proj_start>100</proj_start>
      <proj_end>200</proj_end>
</proj>

... We can't search by proj ID since this value is not fixed from test run to test run. The above file is huge: ~8mb, and there's ~2000 tags w/ the name proj_title. is there an efficient way to first find all tag names w/ proj_title="heat_run", then to retrieve the proj start and end value for this particular proj_title using C#??

Here's my current C# code:

public class parser
{
     public static void Main()
     {
         XmlDocument xmlDoc= new XmlDocument();
         xmlDoc.Load("ex.xml");

         //~2000 tags w/ proj_title
         //any more efficient way to just look for proj_title="heat_run" specifically?
         XmlNodeList heat_run_nodes=xmlDoc.GetElementsByTagName("proj_title");
     }
}    
1
  • I have had a lot of luck with using XML Serialization where you can turn your XML into objects... This Link may help you out Commented Jun 3, 2013 at 16:54

3 Answers 3

14

8MB really isn't very large at all by modern standards. Personally I'd use LINQ to XML:

XDocument doc = XDocument.Load("ex.xml");
var projects = doc.Descendants("proj_title")
                  .Where(x => (string) x == "heat_run")
                  .Select(x => x.Parent) // Just for simplicity
                  .Select(x => new {
                              Start = (int) x.Element("proj_start"),
                              End = (int) x.Element("proj_end")
                          });

foreach (var project in projects)
{
    Console.WriteLine("Start: {0}; End: {1}", project.Start, project.End);
}

(Obviously adjust this to your own requirements - it's not really clear what you need to do based on the question.)

Alternative query:

var projects = doc.Descendants("proj")
                  .Where(x => (string) x.Element("proj_title") == "heat_run")
                  .Select(x => new {
                              Start = (int) x.Element("proj_start"),
                              End = (int) x.Element("proj_end")
                          });
Sign up to request clarification or add additional context in comments.

2 Comments

This helped me a lot! I just need to add 1 more Where condition. Is there an option in LINQ/C# that refers to an ancestor of x for example? like Where (x => (string) x== "heat_run" && (string) x.Ancestor=="heat_test"). I tried this, and it didn't work?
@jerryh91: Well you can use Parent, but I'd typically work the other way round - find the parent with a specific child.
9

You can use XPath to find all nodes that match, for example:

XmlNodeList matches = xmlDoc.SelectNodes("proj[proj_title='heat_run']")

matches will contain all proj nodes that match the critera. Learn more about XPath: http://www.w3schools.com/xsl/xpath_syntax.asp

MSDN Documentation on SelectNodes

Comments

3

Use XDocument and use the LINQ api. http://msdn.microsoft.com/en-us/library/bb387098.aspx

If the performance is not what you expect after trying it, you have to look for a sax parser. A Sax parser will not load the whole document in memory and try to apply an xpath expression on everything in memory. It works more in an event driven approach and in some cases this can be a lot faster and does not use as much memory.

There are probably sax parsers for .NET around there, haven't used them myself for .NET but I did for C++.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.