0

I have an XML file that I'm trying to parse and save to a database in a C# program. For most of the elements in this file, I have been able to use SqlBulkCopy because these elements are arranged well with either unique names for child tags or attributes on the root node. However, I have one element that has child elements with repeating tag names (just "tag") but uses attribute names to describe what it is. I have not been able to save this with SqlBulkCopy, which I would prefer since this file can be as large as 500MB and the SqlBulkCopy class is much faster. I tried the code below, but I can see by debugging that the ds.Tables collection is separating hostproperties and tag. I'm guessing this is just how the ReadXml method works. What would be the easiest way that I could get these tags into a datatable object that has the individual attributes as columns so that I could use SqlBulkCopy?

Current C# Code

DataSet ds = new DataSet();

ds.ReadXml(file.InputStream);

DataTable hostItems = ds.Tables["host"];
conn.Open();

using (SqlBulkCopy sb = new SqlBulkCopy(conn))
{
    sb.DestinationTableName = "HOSTS";
    sb.ColumnMappings.Add("host-ip", "HOST_IP");
    sb.ColumnMappings.Add("host-name", "NAME");
    sb.ColumnMappings.Add("system-type", "SSH_FINGERPRINT");
    sb.ColumnMappings.Add("os", "OS");
    sb.WriteToServer(hostItems);
 }

XML File

<host>
    <tag name="host-ip">192.168.200.8</tag>
    <tag name="host-name">someserver.mydomain.com</tag>
    <tag name="system-type">webserver</tag>
    <tag name="os">WindowsServer2019</tag>
</host>
...
<host>
    <tag name="host-ip">192.168.200.9</tag>
    <tag name="host-name">someserver2.mydomain.com</tag>
    <tag name="system-type">webserver</tag>
    <tag name="os">WindowsServer2019</tag>
    <tag name="attributeFirstOneDidntHave">Some nonsense</tag>
</host>

Edit

I failed to mention that not all of the hosts have the same amount of tags. I have updated the XML example to illustrate this.

2
  • XML is a relational structure and that's why you are getting a table for host and a table for tag. Host is seen as an 'entity' with related tag 'entities'. Commented Jul 29, 2020 at 21:03
  • 1
    One option could be to flatten the host/tag relationship into a new dataset and pass that to your bulk copy method. Commented Jul 29, 2020 at 21:06

1 Answer 1

1

With huge xml files you need to use XmlReader, otherwise, you will get an out of memory error. Below the code uses a combination of xmlreader and xml linq

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.Data;
namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            DataTable dt = new DataTable();
            XmlReader reader = XmlReader.Create(FILENAME);
            long count = 0;
            while (!reader.EOF)
            {
                if (reader.Name != "host")
                {
                    reader.ReadToFollowing("host");
                }
                if (!reader.EOF)
                {
                    XElement host = (XElement)XElement.ReadFrom(reader);
                    if (++count == 1)
                    {
                        foreach (XElement tag in host.Elements("tag"))
                        {
                            dt.Columns.Add((string)tag.Attribute("name"),typeof(string));
                        }
                    }
                    DataRow row = dt.Rows.Add();
                    foreach (XElement tag in host.Elements("tag"))
                    {
                        row[(string)tag.Attribute("name")] = (string)tag;
                    }

                }
            }
        }
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

This pointed me in the right direction and seems to have around the same efficiency. However, one question: how would I read multiple nodes in one pass using this? Say I was looking for this host node and two other nodes on different levels (say, hardware and installedSoftware)? Would I use something other than ReadToFollowing?
I've tried it a couple of times and it becomes messy. I had a huge SVG file to parse and used ReadToFollowing to go between different nodes. I think the best way is just to use a common parent.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.