1

Using SQL Server 2016, primarily.

I have data available to me that is formatted in XML, but I need to be able to join this data to other tables in a SQL Server database.

I have looked around, but the examples I have found use a different data layout in the XML. The examples have distinct node names in the document. My XML has a node named metadata that defines the name, data type, and length of each data item. In the data node, each row has value nodes that correspond with the item nodes in the metadata node. Also, my XML uses namespacing. None of the examples I have found do. I have had problems with namespacing in other SQL/XML tasks I have worked through, so that is probably significant.

How can I use SQL to convert something like this...

<?xml version="1.0" encoding="utf-8"?>
<dataset  xmlns="http://developer.cognos.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
    xmlns="http://developer.cognos.com/schemas/xmldata/1/"
    xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
    xs:schemaLocation="http://developer.cognos.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
    <metadata>
          <item name="Task" type="xs:string" length="-1"/>
          <item name="Task Number" type="xs:string" length="-1"/>
          <item name="Group" type="xs:string" length="-1"/>
          <item name="Work Order" type="xs:string" length="-1"/>
    </metadata>
    <data>
        <row>
            <value>3361B11</value>
            <value>1</value>
            <value>01</value>
            <value>MS7579</value>
        </row>
        <row>
            <value>3361B11</value>
            <value>2</value>
            <value>50</value>
            <value>MS7579</value>
        </row>
        <row>
            <value>3361B11</value>
            <value>3</value>
            <value>02</value>
            <value>JA0520</value>
        </row>
    </data>
</dataset>

...into a tabular format like...

Task Task Number Group Work Order
3361B11 1 01 MS7579
3361B11 2 50 MS7579
3361B11 3 02 JA0520

...so I can join it to other data.

I think the first step would be to query the metadata node to get the column name, data type, and length.

input (boiled down)

<?xml version="1.0" encoding="utf-8"?>
<dataset  xmlns="http://developer.cognos.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <metadata>
          <item name="Task" type="xs:string" length="-1"/>
          <item name="Task Number" type="xs:string" length="-1"/>
          <item name="Group" type="xs:string" length="-1"/>
          <item name="Work Order" type="xs:string" length="-1"/>
    </metadata>
</dataset>

output

id name type length
1 Task string -1
2 Task Number string -1
3 Group string -1
4 Work Order string -1

Once that is obtained, the next part would be generated dynamically using those values.

In case it is helpful for fiddling, here is a SQL statement that produces the desired result:

with
xlTask as (
  select *
  from (
    values
      ('3361B11', '1', '01', 'MS7579')
    , ('3361B11', '2', '50', 'MS7579')
    , ('3361B11', '3', '02', 'JA0520')
  ) q ([Task], [Task Number], [Group], [Work Order])
)
select *
from xlTask

4 Answers 4

2

Please try the following solution.

It will work on the condition if all <value> XML elements always have the same sequential order, from 1 to 4.

After that having data as a relational/rectangular dataset you can join it with real tables/views in the database.

SQL

DECLARE @xml XML =
'<dataset xmlns="http://developer.cognos.com/schemas/xmldata/1/"
         xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <metadata>
        <item name="Task" type="xs:string" length="-1"/>
        <item name="Task Number" type="xs:string" length="-1"/>
        <item name="Group" type="xs:string" length="-1"/>
        <item name="Work Order" type="xs:string" length="-1"/>
    </metadata>
    <data>
        <row>
            <value>3361B11</value>
            <value>1</value>
            <value>01</value>
            <value>MS7579</value>
        </row>
        <row>
            <value>3361B11</value>
            <value>2</value>
            <value>50</value>
            <value>MS7579</value>
        </row>
        <row>
            <value>3361B11</value>
            <value>3</value>
            <value>02</value>
            <value>JA0520</value>
        </row>
    </data>
</dataset>';

WITH XMLNAMESPACES(DEFAULT 'http://developer.cognos.com/schemas/xmldata/1/')
SELECT c.value('(value[1]/text())[1]', 'VARCHAR(20)') AS Task
    , c.value('(value[2]/text())[1]', 'VARCHAR(20)') AS [Task Number]
    , c.value('(value[3]/text())[1]', 'VARCHAR(20)') AS [Group]
    , c.value('(value[4]/text())[1]', 'VARCHAR(20)') AS [Work Order]
FROM @xml.nodes('/dataset/data/row') AS t(c);

Output

Task Task Number Group Work Order
3361B11 1 01 MS7579
3361B11 2 50 MS7579
3361B11 3 02 JA0520
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Yitzhak. That only works if I know what the metadata node contains for each document and want to manually create a different query for each document. The number of value nodes per row varies and their names, data types, and lengths vary. How could this be done dynamically? I'm thinking dynamic SQL. I could start with something like stackoverflow.com/questions/3193956/…, but there's no description. ...and I'm guilty of not wanting to endure an entire semester of XQUERY to answer one question.
While asking a question, you need to provide a minimal reproducible example: (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;).
OK. Added SQL Server version.
1

If you don't know the ordering you can pull that out of the metadata node, and pass it in to predicates using the sql:column function.

Unfortunately, SQL Server does not support the returning the position(.) XQuery function outside a predicate, so we need to hack it by counting nodes.

WITH XMLNAMESPACES(DEFAULT 'http://developer.cognos.com/schemas/xmldata/1/')
SELECT
  c.value('(value[position() = sql:column("pos.Task"      )]/text())[1]', 'VARCHAR(20)') AS Task,
  c.value('(value[position() = sql:column("pos.TaskNumber")]/text())[1]', 'VARCHAR(20)') AS TaskNumber,
  c.value('(value[position() = sql:column("pos.Group"     )]/text())[1]', 'VARCHAR(20)') AS [Group],
  c.value('(value[position() = sql:column("pos.WorkOrder" )]/text())[1]', 'VARCHAR(20)') AS WorkOrder
FROM (
    SELECT
      Task       = md.md.value('let $i := item[@name = "Task"       ][1] return count(./item[. << $i]) + 1', 'int'),
      TaskNumber = md.md.value('let $i := item[@name = "Task Number"][1] return count(./item[. << $i]) + 1', 'int'),
      [Group]    = md.md.value('let $i := item[@name = "Group"      ][1] return count(./item[. << $i]) + 1', 'int'),
      WorkOrder  = md.md.value('let $i := item[@name = "Work Order" ][1] return count(./item[. << $i]) + 1', 'int')
    FROM @xml.nodes('dataset/metadata') md(md)
) pos
CROSS APPLY @xml.nodes('/dataset/data/row') AS t(c);

The logic is as follows:

  • Within a subquery:
    • In XQuery, take the metadata node, and look for a descendant item which has name = "Task" and so on. Assign that node to $i.
    • Return the count of item nodes that are before $i plus one.
    • And so on for each column name.
  • Then shred the rest of the XML as normal.
  • In the predicates, use position() = sql:column("YourColumnName") which gives us the previously calculated position.

db<>fiddle


If you want a completely dynamic solution, with dynamic column names, then you need dynamic SQL.

DECLARE @cols nvarchar(max);

WITH XMLNAMESPACES(DEFAULT 'http://developer.cognos.com/schemas/xmldata/1/')
SELECT
  @cols = CAST(@xml.query('
  text {
    for $i in dataset/metadata/item
    return concat(
      ", c.value(''(value[",
      string(count(./item[. << $i]) + 1),
      "]/text())[1]'', ''VARCHAR(100)'') AS [",
      $i/@name,
      "]
 "
    )
  }
  ') AS nvarchar(max));

SET @cols = STUFF(@cols, 1, 2, N'');

DECLARE @sql nvarchar(max) = '
WITH XMLNAMESPACES(DEFAULT ''http://developer.cognos.com/schemas/xmldata/1/'')
SELECT
    ' + @cols + '
FROM @xml.nodes(''/dataset/data/row'') AS t(c);
';

PRINT @sql;  -- your friend

EXEC sp_executesql @sql,
  N'@xml xml',
  @xml = @xml;

Note that sp_executesql is used, in order to pass the @xml variable all the way through.

db<>fiddle

Comments

0

Here's a dynamic solution from your sample data:

DECLARE @SQL NVARCHAR(MAX)

;WITH XMLNAMESPACES(DEFAULT 'http://developer.cognos.com/schemas/xmldata/1/')
SELECT  @SQL = N'WITH XMLNAMESPACES(DEFAULT ''http://developer.cognos.com/schemas/xmldata/1/'')
SELECT '
 + STRING_AGG('c.value(''(value[' + T.X.value('let $i := . return count(/dataset/metadata/item[. << $i]) + 1', 'NVARCHAR(MAX)') + N']/text())[1]'', ''VARCHAR(20)'') AS '
 + QUOTENAME(T.X.value('@name', 'nvarchar(100)')) , ',')
 + N'
FROM @xml.nodes(''/dataset/data/row'') AS t(c)'
FROM    @xml.nodes('/dataset/metadata/item') as T(X);
EXEC SP_EXECUTESQL @SQL, N'@xml XML', @xml = @xml

It reads the metadata and generates sql that reads actual rows and creates proper aliases

1 Comment

SQL magic. Perfect. Now I'll just need to rewrite the STRING_AGG part to work in SQL Server 2016.
0

Maybe you can try to create something to convert any XML to a SQL Table and then use SQL to extract the data

You can try the following:

USE:

select *  from Utility.FlattenXml('your xml’)
    
PROCEDURE:

USE [YOUR_DB] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO
 
CREATE FUNCTION [Utility].[FlattenXml](@xmlDoc XML)   RETURNS TABLE 
 AS RETURN  WITH CTE AS (

SELECT 
        1 AS lvl,  
        x.value('local-name(.)','NVARCHAR(MAX)') AS Name,  
        CAST(NULL AS NVARCHAR(MAX)) AS ParentName, 
        CAST(1 AS INT) AS ParentPosition, 
        CAST(N'Element' AS NVARCHAR(20)) AS NodeType,  
        x.value('local-name(.)','NVARCHAR(MAX)') AS FullPath,  
        x.value('local-name(.)','NVARCHAR(MAX)')  
        + N'[' 
        + CAST(ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS NVARCHAR)  
        + N']' AS XPath,  
        ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS Position, 
        x.value('local-name(.)','NVARCHAR(MAX)') AS Tree,  
        x.value('text()[1]','NVARCHAR(MAX)') AS Value,  
        x.query('.') AS this,         
        x.query('*') AS t,  
        CAST(CAST(1 AS VARBINARY(4)) AS VARBINARY(MAX)) AS Sort,  
        CAST(1 AS INT) AS ID  
FROM @xmlDoc.nodes('/*') a(x)  
UNION ALL 
SELECT 
        p.lvl + 1 AS lvl,  
        c.value('local-name(.)','NVARCHAR(MAX)') AS Name,  
        CAST(p.Name AS NVARCHAR(MAX)) AS ParentName, 
        CAST(p.Position AS INT) AS ParentPosition, 
        CAST(N'Element' AS NVARCHAR(20)) AS NodeType,  
        CAST(p.FullPath + N'/' + c.value('local-name(.)','NVARCHAR(MAX)') AS NVARCHAR(MAX)) AS FullPath,  
        CAST(p.XPath + N'/'+ c.value('local-name(.)','NVARCHAR(MAX)')+ N'['+ CAST(ROW_NUMBER() OVER(PARTITION BY c.value('local-name(.)','NVARCHAR(MAX)') 
        ORDER BY (SELECT 1)) AS NVARCHAR)+ N']' AS NVARCHAR(MAX)) AS XPath,  
        ROW_NUMBER() OVER(PARTITION BY c.value('local-name(.)','NVARCHAR(MAX)')
        ORDER BY (SELECT 1)) AS Position, 
        CAST( SPACE(2 * p.lvl - 1) + N'|' + REPLICATE(N'-', 1) + c.value('local-name(.)','NVARCHAR(MAX)') AS NVARCHAR(MAX)) AS Tree,  
        CAST( c.value('text()[1]','NVARCHAR(MAX)') AS NVARCHAR(MAX) ) AS Value, c.query('.') AS this,  
        c.query('*') AS t,  
        CAST(p.Sort + CAST( (lvl + 1) * 1024 + (ROW_NUMBER() OVER(ORDER BY (SELECT 1)) * 2) AS VARBINARY(4)) AS VARBINARY(MAX) ) AS Sort,  
        CAST((lvl + 1) * 1024 + (ROW_NUMBER() OVER(ORDER BY (SELECT 1)) * 2) AS INT)
        
FROM CTE p  
CROSS APPLY p.t.nodes('*') b(c)), cte2 AS (  
                                            SELECT 
                                            lvl AS Depth,  
                                            Name AS NodeName,  
                                            ParentName, 
                                            ParentPosition, 
                                            NodeType,  
                                            FullPath,  
                                            XPath,  
                                            Position, 
                                            Tree AS TreeView,  
                                            Value,  
                                            this AS XMLData,  
                                            Sort, 
                                            ID
                                            FROM cte  
UNION ALL 
SELECT 
        p.lvl,  
        x.value('local-name(.)','NVARCHAR(MAX)'),  
        p.Name, 
        p.Position, 
        CAST(N'Attribute' AS NVARCHAR(20)),  
        p.FullPath + N'/@' + x.value('local-name(.)','NVARCHAR(MAX)'),  
        p.XPath + N'/@' + x.value('local-name(.)','NVARCHAR(MAX)'),  
        1, 
        SPACE(2 * p.lvl - 1) + N'|' + REPLICATE('-', 1)  
        + N'@' + x.value('local-name(.)','NVARCHAR(MAX)'),  
        x.value('.','NVARCHAR(MAX)'),  
        NULL,  
        p.Sort,  
        p.ID + 1  
FROM CTE p  
CROSS APPLY this.nodes('/*/@*') a(x)  
)  

SELECT into [your table]
        ROW_NUMBER() OVER(ORDER BY Sort, ID) AS ID,  
        ParentName, ParentPosition,Depth, NodeName, Position,   
        NodeType, FullPath, XPath, TreeView, Value, XMLData 
FROM CTE2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.