0

Is it possible in SQL SERVER to Query the XML in such a way that if the input XML was the one below:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
    <soapenv:Body>
        <genRetrieve xmlns:v1="http://xxxxxxxxxxxxxxxxxxxxx">
            <checkRetrieve>
                <party>
                    <user>
                        <first>BLA</first>
                        <last>last</last>
                    </user>
                    <media /> --This element will also need to be picked in output
                </party>
            </checkRetrieve>
        </genRetrieve>
    </soapenv:Body>
</soapenv:Envelope>

Produce a table that has the text nodes/elements and their corresponding XPATH in a table?

TEXT NODE       XPATH
---------       ---------
first           /soapenv:Envelope/soapenv:Body/genRetrieve/checkRetrieve/party/user/first   
last            /soapenv:Envelope/soapenv:Body/genRetrieve/checkRetrieve/party/user/last
media           /soapenv:Envelope/soapenv:Body/genRetrieve/checkRetrieve/party/media

1 Answer 1

2

Solution via openxml.

declare @idoc int, @doc varchar(max);
set @doc =
'<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
    <soapenv:Body>
        <genRetrieve xmlns:v1="http://xxxxxxxxxxxxxxxxxxxxx">
            <checkRetrieve>
                <party>
                    <user>
                        <first>BLA</first>
                        <last>last</last>
                    </user>
                    <media>none</media>
                </party>
            </checkRetrieve>
        </genRetrieve>
    </soapenv:Body>
</soapenv:Envelope>'

exec sp_xml_preparedocument @idoc output, @doc;

;with map as (
    select *
    from openxml (@idoc, '//*')
), rcte as (
    select localname, parentid, '/' + isnull (prefix + ':', '') + localname as XPATH
    from openxml (@idoc, '//*[text()]')
    where nodetype = 1 and [text] is not null -- localname <> '#text'
    union all
    select r.localname, m.parentid, '/' + isnull (prefix + ':', '') + m.localname + XPATH
    from rcte r
    inner join map m on r.parentid = m.id
)

select localname as [TEXT NODE], XPATH
from rcte
where parentid is null;

exec sp_xml_removedocument @idoc;

UPD. Solution with nodes numeration.

exec sp_xml_preparedocument @idoc output, @doc;

;with map as (
    select id, parentid, nodetype, localname, prefix, row_number() over(partition by parentid, prefix, localname order by id) as num
    from openxml (@idoc, '//*')
    where nodetype = 1 or (nodetype = 3 and [text] is not null)
), rcte as (
    select p.localname, p.parentid, '/' + isnull (p.prefix + ':', '') + p.localname + '[' + cast (p.num as varchar(50)) + ']' as XPATH
    from map c
    inner join map p on c.parentid = p.id
    where c.nodetype = 3
    union all
    select r.localname, m.parentid, '/' + isnull (prefix + ':', '') + m.localname + '[' + cast (m.num as varchar(50)) + ']' + XPATH
    from rcte r
    inner join map m on r.parentid = m.id
)

select localname as [TEXT NODE], XPATH
from rcte
where parentid is null;

exec sp_xml_removedocument @idoc;

UPD 2. Yet another solution showing VALUE and nodes without text.

exec sp_xml_preparedocument @idoc output, @doc;

;with map as (
    select id, parentid, nodetype, localname, prefix, [text]
        , row_number() over(partition by parentid, prefix, localname order by id) as num
    from openxml (@idoc, '//*')
    where nodetype = 1 or (nodetype = 3 and [text] is not null)
)
, rcte as (
    select  localname, parentid, '/' + isnull (prefix + ':', '') + localname + '[' + cast (num as varchar(50)) + ']' as XPATH, VALUE
    from (
        select p.localname, p.parentid, p.prefix, p.num
            , min (c.nodetype) as min_nodetype
            , min (case when c.nodetype = 3 then cast (c.[text] as nvarchar(max)) end) as VALUE
        from map p
        left join map c on p.id = c.parentid
        where p.nodetype = 1
        group by p.localname, p.parentid, p.prefix, p.num
    ) t
    where min_nodetype = 3 or min_nodetype is null
    union all
    select r.localname, m.parentid, '/' + isnull (prefix + ':', '') + m.localname + '[' + cast (m.num as varchar(50)) + ']' + XPATH, VALUE
    from rcte r
    inner join map m on r.parentid = m.id
)

select localname as [TEXT NODE], XPATH, VALUE
from rcte
where parentid is null;

exec sp_xml_removedocument @idoc;
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for giving the provided answer. Would it be possible to distinguish the duplicated parent-child nodes with [n]? For example if this was the xml: <user> <first>BLA</first> <last>last</last> </user> <user> <first>BLA</first> <last>last</last> </user> the xpaths should be: /Envelope/Body[1]/genRetrieve[1]/checkRetrieve[1]/party[1]/user[1]/first[1] /Envelope/Body[1]/genRetrieve[1]/checkRetrieve[1]/party[1]/user[1]/last[1] /Envelope/Body[1]/genRetrieve[1]/checkRetrieve[1]/party[1]/user[2]/first[1] ....
I've marked your solution as answer - Thank you. I did notice though that if the element doesn't have a value it is left out in the output, for example if <media>none</media> was instead <media /> it is not included in the output where in fact it should be. Would you be able to suggest a remedy? Also On the same table is it possible to show the VALUE of that element if it had any?
Solution becomes more and more complex =) See UPD 2.
the function is performing incredibly slow with a xml payload that has 30,000 elements and 8000 text nodes - i had to kill it after 15 minutes :( Please help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.