0

I have been trying to import the ssis dtsx (xml file) to a SQL server table. The format is getting changed in table. like < in source file getting changed to &lt; and multiple lines in source becoming single line in target.

Create table #XMLFilesTable(XMLData xml)

INSERT INTO #XMLFilesTable(XMLData)
SELECT Convert(XML,BulkColumn) As BulkColumn
FROM Openrowset( Bulk 'C:\Users\myFile.dtsx', Single_Blob) as Image

Is there any way to import the xml as is without change in target. I need this field as xml datatype in target. Is there any other way?

1 Answer 1

2

Your question is not very clear. But my magic crystal ball told me, that your issue might be in the way of reading the XML after the import.

If the snippet you provide really works, it seems to be possible to load the file's content into a natively XML typed column. If there were any issues with the file, the XML's format, the XML not being well formed - what ever - this would fail.

Due to obvious reasons there are some characters, which cannot be within an XML's content as they are used for the markup part, namely <, > and & (but there are more).

Such characters need escaping. In XML we speak about entities. Done properly all this magic is done implicitly and you should not have to bother about this at all.

Some possible ideas:

Wrong encodig

The string copy & paste will translate to copy &amp; paste. I have seen cases, were the developers built the XML via string concatenation. If the value is pre-encoded (copy &amp; paste) but the XML's creation is changed to a real XML engine, you will get copy &amp;amp; paste

Wrong reading

If the correctly encoded XML is read with string methods (SUBSTRING et al.), such encoded entities will remain as they are.

CDATA sections

If your XML includes CDATA sections you will find, that the developers auf SQL Server decided not to support this any more. There is actually no reason for a CDATA because a properly escaped content is semantically identical:

<root><![CDATA[test with <, > and &]]></root>

is eaxctly the same as this

<root>test with &lt;, &gt; and &amp;</root>

CDATA sections are removed automatically. Try it out:

DECLARE @xml XML='<root><![CDATA[test with <, > and &]]></root>';
SELECT @xml;

What you can do

It would help to provide a (reduced!) sample of your XML (some part with such characters).

If the XML is not double-encoded, I'm pretty sure your issue is on the reading side.

One example to check this out

DECLARE @value VARCHAR(100)='copy & paste';
DECLARE @tbl TABLE(Explanation VARCHAR(100),theXml XML);
INSERT INTO @tbl VALUES('encodig by engine'    ,(SELECT @value FOR XML PATH('root')))
                      ,('correct pre-encoding' ,'<root>copy &amp; paste</root>') 
                      ,('double encodig'       ,'<root>copy &amp;amp; paste</root>')
                      /*,('not well formed','<root>copy & paste</root>') --have to exclude this as it would fail*/
SELECT Explanation
      ,theXml
      ,theXml.value(N'(/root/text())[1]',N'nvarchar(max)') AS TheContent
FROM @tbl

The result

Explanation              theXml                             TheContent
encodig by engine       <root>copy &amp; paste</root>       copy & paste
correct pre-encoding    <root>copy &amp; paste</root>       copy & paste
double encodig          <root>copy &amp;amp; paste</root>   copy &amp; paste

Finally a trick how you can "correct" a wrong result, if you cannot change the above:

DECLARE @value VARCHAR(100)='copy &amp; paste';
SELECT CAST('<x>' + @value + '</x>' AS XML).value('.','nvarchar(max)')
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your useful comment. I am trying to load the ssis xml file to a table and updating an attribute value by reading the xml nodes, then exporting the modified xml to local. The modified xml is running/working as expected. But,the only difference is the original file was easy to read (well formatted). Now, the formatting has changed and difficult to read and find a node (If I see from the visual studio with ViewCode option).
@p2k Is this well formatted content encoded as CDATA?
@Shungo, I can see <property state="cdata"> and ![CDATA some places used.
@p2k if you find readable parts within cdata we speak about CDATA sections. This will look something like this: <![CDATA[Content goes here]]>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.