1

I have a XML file stored in a XML datatype column data in my table records.

The table looks like this:

create table records 
(
     id int,
     type nvarchar(28),
     data xml,
     posted datetime
)

XML data:

<Properties>
    <data>
        <Name>novel</Name>
        <Gender>Female</Gender>
        <Age>32</Age>
        <Salary>55k</Salary>
        <Phone>123-123</Phone>
    </data>
</Properties>

I am currently using following query to extract data from that XML column which is taking more than minutes in 20K records.

select
    id,
    posteddate,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Name)') ) AS Name,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Gender)') ) AS Gender,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Age)') ) AS Age,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Salary)') ) AS Salary,
    CONVERT( NVARCHAR(500), data.query('data(Properties/data/Phone)') ) AS Phone
from 
    records
where 
    type = 'personnel_xml'

Can anybody help explain how can I optimize this scenario as I need to extract 100 such elements from my XML stored as a column.

2 Answers 2

4

Assuming you have multiple <data> within the XML. Notice I added an expanded XML file which will have two sets.

Declare @table table (id int,data xml)
Insert Into @table values (1,'<Properties><data><Name>novel</Name><Gender>Female</Gender><Age>32</Age><Salary>55k</Salary><Phone>123-123</Phone></data>
<data><Name>Another Name</Name><Gender>Male</Gender><Age>45</Age><Salary>75k</Salary><Phone>555-1212</Phone></data>
</Properties>')

;with cte as (
      Select ID
            ,RN   = Row_Number() over (Partition By ID Order By (Select Null))
            ,Data = m.query('.') 
      From   @table AS t
      Cross Apply t.Data.nodes('/Properties/data') AS A(m)
 )
Select ID
      ,RN
      ,Name   = Data.value('(data/Name)[1]'  ,'nvarchar(500)')
      ,Gender = Data.value('(data/Gender)[1]','nvarchar(500)')
      ,Age    = Data.value('(data/Age)[1]'   ,'nvarchar(500)')
      ,Salary = Data.value('(data/Salary)[1]','nvarchar(500)')
      ,Phone  = Data.value('(data/Phone)[1]' ,'nvarchar(500)')
 From  cte

Returns

ID  RN  Name            Gender  Age     Salary  Phone
1   1   novel           Female  32      55k     123-123
1   2   Another Name    Male    45      75k     555-1212
Sign up to request clarification or add additional context in comments.

4 Comments

stackoverflow.com/users/1570000/john-cappelletti my xml has single data element so in that case i hope i dont need cross apply. So, from the perspective of performance, which one should be choosen, value() or query() ?
@JohnCappelleti what is the difference in performance in 'storing the whole xml in #temp table and extracting from #temp' OR 'directly extracting from table.column without using #temp' for my scenario of 200s of elements without repeating <data> attribute
@LearnByExample Correct, since you have single data element there is no need for the cross apply. and since you know the path value would be faster than query.
@LearnByExample Regarding your #temp. Hard to say. One test is worth a thousand expert opinions :)
3

To get a value out of XML in SQL Server you should use the value() Method (xml Data Type). And for untyped XML you should specify the text() node to get better performance.

select R.id,
       R.posted,
       R.data.value('(/Properties/data/Name/text())[1]', 'nvarchar(500)') as Name,
       R.data.value('(/Properties/data/Gender/text())[1]', 'nvarchar(10)') as Gender,
       R.data.value('(/Properties/data/Age/text())[1]', 'int') as Age,
       R.data.value('(/Properties/data/Salary/text())[1]', 'nvarchar(10)') as Salary,
       R.data.value('(/Properties/data/Phone/text())[1]', 'nvarchar(30)') as Phone
from dbo.records as R
where type = N'personnel_xml';

7 Comments

Hi, do you know, if the engine is smart enough to find, that (/Properties/data/ is the same in all cases? I thought it was faster either to use something like John Cappelletti with .query() in a CTE, or - even better - CROSS APPLY on .nodes(N'/Properties/data') and then use .value() on (Name/text())[1]. Won't it navigate through the whole XPath over and over in this case?
@Shnugo Hi, I have done some performance testing regarding using cross apply nodes to get a shorter expression in the values function and that is slower (not much). Using query() is not good at all since the UDX operator that creates the new XML is slowing things down. Not sure if this holds true for all shapes and forms of XML but it does for all I have tried.
@Shnugo BTW, the entire XML is available for the value function even if you use cross apply nodes. What you have is something called the "context node" which basically is a hierarchical ID that is used as a parameter to the value function. Apparently it is not much faster to find nodes using that instead of a path expression in the internal representation of the XML. At least not so much faster that it compensate for the cost of the extra function call because of the cross apply nodes.
Thank you for the quick response! I had thought, that navigating with a fix starting point (the context node) would be faster... This might depend on the depth and complexity... When I've got some extra time, I'll test this out.
@Shnugo my only focus is on performance. xml is pretty simple but only have huge number of elements (almost 200 elements) separated on 4 heading elements like <data><personnel> 50 <> elements...</personnel><education>50 <> elements</education></data> and the number of table rows is above 100K . Can you recommend some tools that i could use to measure/test the running time.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.