1

I am trying to parse an XML response from a website in C#. The response comes in a format similar to the following:

<Company>
    <Owner>Bob</Owner>
    <Contact>
        <address> -1 Infinite Loop </address>
        <phone>
            <LandLine>(000) 555-5555</LandLine>
            <Fax> (000) 555-5556 </Fax>
        </phone>
        <email> [email protected] </email>
    </Contact>
</Company>

The only information I want is the LandLine and Fax numbers. However my current approach seems really really poor quality. Essentially it is a bunch of nested while loops and checks to the Element name then reading the Content when I found the right Element. I am using something like the listing below:

XmlReader xml = XmlReader.Create(websiteResultStream, xmlSettings);

while(xml.Read()){
    if(xml.NodeType == XmlNodeType.Element){
        if(xml.Name.ToString() == "Phone"){
            while(xml.Read()) {
                if(xml.NodeType == XmlNodeType.Element) {
                     if(xml.Name.ToString() == "LandLine"){
                          xml.MoveToContent();
                          xml.ReadContentAsString();
                     }
                     if(xml.Name.ToString() == "Fax"){
                          xml.MoveToContent();
                          xml.ReadContentAsString();
                     }
                }
            }
        }
    }
}

I am newer to XML/C#, but the above method just screams bad code! I want to ensure that if the structure changes (i.e. there are addition phone number types like "mobile") that the code is robust (hence the additional while loops)

Note: the above C# code is not exact, and lacks some checks etc, but it demonstrates my current abysmal disgusting approach

What is the best/cleanest way to simply extract the content from those two Elements if they are present?

0

4 Answers 4

8

The most light-weight approach for read-only access to specific nodes in an XML document is by using an XPathDocument together with an XPath expression:

XPathDocument xdoc = new XPathDocument(@"C:\sample\document.xml");
XPathNavigator node = xdoc.CreateNavigator()
    .SelectSingleNode("/Company/Contact/phone/LandLine");
if (node != null)
{
    string landline = node.Value;
}
Sign up to request clarification or add additional context in comments.

Comments

8

Use LINQ-to-XML:

var doc = XDocument.Parse(@"<Company>
    <Owner>Bob</Owner>
    <Contact>
        <address> -1 Infinite Loop </address>
        <phone>
            <LandLine>(000) 555-5555</LandLine>
            <Fax> (000) 555-5556 </Fax>
        </phone>
        <email> [email protected] </email>
    </Contact>
</Company>");

var phone = doc.Root.Element("Contact").Element("phone");

Console.WriteLine((string)phone.Element("LandLine"));
Console.WriteLine((string)phone.Element("Fax"));

Output:

(000) 555-5555
 (000) 555-5556

2 Comments

Note that if Contact is missing, you'll get an exception on the var phone = ... line. I like to do things like var contactNode = doc.Root.Element("Contact") ?? new XElement("Contact"); so I always have a node returned, and then when I do var phone = contact.Element("phone") ?? new XElement("phone"); I won't get null object errors. And in the end, I just end up with blank values for the variables. Or use an xsd to validate the document prior to parsing to ensure the nodes you want exist.
Note that the XDocument class also comes with the overhead of building up a DOM tree in memory; usually not what you need for read-only random access to nodes in the document, especially when you deal with large documents.
2

I don't think you're too far off. There are more convenient methods (lots of different approaches). Assuming you want to take the same basic approach as you do here (and it is an efficient if verbose one), I'd do:

bool inPhone = false;
string landLine = null;
string fax = null;

using(xml = XmlReader.Create(websiteResultStream, xmlSettings)
while(xml.Read())
{
  switch(xml.NodeType)
  {
    case XmlNodeType.Element:
      switch(xml.LocalName)
      {
        case "phone":
          inPhone = true;
          break;
        case "LandLine":
          if(inPhone)
          {
            landLine = xml.ReadElementContentAsString();
            if(fax != null)
            {
              DoWhatWeWantToDoWithTheseValues(landline, fax);
              return;
            }
          }
          break;
        case "Fax":
          if(inPhone)
          {
            fax = xml.ReadElementContentAsString();
            if(landLine != null)
            {
              DoWhatWeWantToDoWithTheseValues(landline, fax);
              return;
            }
          }
          break;
      }
      break;
    case XmlNodeType.EndElement:
      if(xml.LocalName == "phone")
        inPhone = false;
      break;
  }
}

Note that this tracks whether it's "inside" a Phone element where that which you have would re-examine a LandLine inside a later element, which you seem to be trying to avoid.

Note also that we clean up the XmlReader, and do so by returning as soon as we have all the information we want.

Comments

1

The best way to do that is to use XPath. Refer to this article, for reference: http://support.microsoft.com/kb/308333

and this article for how to do it: http://www.codeproject.com/KB/cpp/myXPath.aspx

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.