1

What is the simplest way to parse XML into columns with SQL Server. Example:

<ns0:root xmlns:ns0="http://herp...">
    <ns1:FirstElement xmlns:ns1="http://derpyderp...">
        <ns1:FirstElementID>AAF11303</ns1:FirstElementID>
        <ns1:FirstEValue>some random text</ns1:FirstEValue>
        <ns1:SecondElement>
            <ns1:Something>Asdsad</ns1:Something>
            <ns1:Else>
                <ns1:Stuff>sdf</ns1:Stuff>
                <ns1:StuffVal>15</ns1:StuffVal>
            </ns1:Else>
            <ns1:Else>
                <ns1:Stuff>jarjar</ns1:Stuff>
                <ns1:StuffVal>16</ns1:StuffVal>
                <ns1:StuffParam>true</ns1:StuffParam>
            </ns1:Else>
        </ns1:SecondElement>
        <ns1:randValue>dosd</ns1:randValue>
    </ns1:FirstElement>
    <ns1:FirstElement>
        <ns1:FirstElementID>DDF00301</ns1:FirstElementID>
        <ns1:FirstEValue/>
        <ns1:SecondElement>
            <ns1:Else>
                <ns1:Stuff>yessir</ns1:Stuff>
                <ns1:StuffVal>0</ns1:StuffVal>
            </ns1:Else>
        </ns1:SecondElement>        
    </ns1:FirstElement>
    <!-- ... times n the first element with a variating amount of children up to 15 levels deep -->
</ns0:root>

I'd like this as a simple column output, ie:

FIRSTELEMENTID | FIRSTEVALUE      | SOMETHING | ELSE.STUFF | ELSE.STUFFVAL | ELSE.STUFFPARAM | RANDVALUE
'AAF11303'     |'some random text'| 'Asdasd'  | 'sdf'      | 15            | NULL            | 'dosd'
'AAF11303'     |'some random text'| 'Asdasd'  | 'jarjar'   | 16            | TRUE            | 'dosd'
'DDF00301'     | NULL             | NULL      | 'yessir'   | 0             | NULL            | NULL

Now, the actual XML is a lot more complex and all the examples I've seen so far involve manually parsing the XML and basically passing each child element as its own XML and parsing that separately in the next loop of the script etc etc etc.

Is there any way to just output the values flat out like that? Manually parsing several MB of XML with countless variations in which elements appear and how many times and where would make it a days long operation to do.

I've been thinking about doing a CLR assembly to just do this with C# and pass the result into SQL as a table, but I'd like to know if there's another way all the same.

Thanks!

1 Answer 1

3

Basically, you can use nodes() and values() methods to parse your xml (and with xmlnamespaces to handle namespaces). Something like this:

;with xmlnamespaces('http://herp...' as ns0, 'http://derpyderp...' as ns1)
select
    T.C.value('(../../ns1:FirstElementID/text())[1]', 'nvarchar(max)') as FirstElementID,
    T.C.value('(../../ns1:FirstEValue/text())[1]', 'nvarchar(max)') as FirstEValue,
    T.C.value('(../ns1:Something/text())[1]', 'nvarchar(max)') as Something,
    T.C.value('(ns1:Stuff/text())[1]', 'nvarchar(max)') as [Else.Stuff],
    T.C.value('(ns1:StuffVal/text())[1]', 'nvarchar(max)') as [Else.StuffVal],
    T.C.value('(ns1:StuffParam/text())[1]', 'nvarchar(max)') as [Else.StuffParam],
    T.C.value('(../../ns1:randValue/text())[1]', 'nvarchar(max)') as [randValue]
from @data.nodes('ns0:root/ns1:FirstElement/ns1:SecondElement/ns1:Else') as T(C)

This one create one row for each ns0:root/ns1:FirstElement/ns1:SecondElement/ns1:Else and then take all values you need (some of them taken from parent nodes). Note that if your First FirstElement does not contain any of the ns1:SecondElement/ns1:Else nodes, it will not appear in the resultset. In that case, you may want to use a query like this:

;with xmlnamespaces('http://herp...' as ns0, 'http://derpyderp...' as ns1)
select
    F.C.value('(ns1:FirstElementID/text())[1]', 'nvarchar(max)') as FirstElementID,
    F.C.value('(ns1:FirstEValue/text())[1]', 'nvarchar(max)') as FirstEValue,
    S.C.value('(ns1:Something/text())[1]', 'nvarchar(max)') as Something,
    E.C.value('(ns1:Stuff/text())[1]', 'nvarchar(max)') as [Else.Stuff],
    E.C.value('(ns1:StuffVal/text())[1]', 'nvarchar(max)') as [Else.StuffVal],
    E.C.value('(ns1:StuffParam/text())[1]', 'nvarchar(max)') as [Else.StuffParam],
    F.C.value('(ns1:randValue/text())[1]', 'nvarchar(max)') as [randValue]
from @data.nodes('ns0:root/ns1:FirstElement') as F(C)
    outer apply F.C.nodes('ns1:SecondElement') as S(C)
    outer apply S.C.nodes('ns1:Else') as E(C)

sql fiddle demo

Sign up to request clarification or add additional context in comments.

1 Comment

A year later I'd forgotten about this but this reply was absolutely the one that led to the right answer. I had to bulk insert the XML from a file into a table and then index the XML, but eventually we found and fixed quite a few bugs in the old solution, understood the whole business need far better, and outperformed the old solution with 15x better performance. So thank you indeed!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.