14

We have a SQL Server 2008 R2 database table with XML stored in a column of VARCHAR data type.

I now have to fetch some of the elements of the xml.

So I want to first convert the xml stored as a VARCHAR data type, to an xml stored as xml data type.

Example :

Table A

Id(int) , ProductXML (varchar(max))

Table B

Id(int), ProductXML(XML)

I want to convert the ProductXML from Table A into XML data type and insert into Table B.

I tried using the CAST() and CONVERT() function as shown below :

insert into TableB (ProductXML)
select CAST(ProductXML as XML) from TableA;

Similarly tried convert but I get an error

XML Parsing : unable to switch encoding

Is there any way I can convert the varchar entries in the table into XML entries ?

About the XML: it is huge with many nodes, and its structure changes dynamically.

Example : One row can have and XML entry for 1 product and another row can have an xml entry for multiple products.

2 Answers 2

34

Give us a sample of your XML as all these would work:

CONVERT(XML, '<root><child/></root>')
CONVERT(XML, '<root>          <child/>         </root>', 1)
CAST('<Name><FName>Carol</FName><LName>Elliot</LName></Name>'  AS XML)

Also you might have to cast it to nvarchar or varbinary first (from Microsoft documentation):

You can parse any of the SQL Server string data types, such as [n][var]char, [n]text, varbinary,and image, into the xml data type by casting (CAST) or converting (CONVERT) the string to the xml data type. Untyped XML is checked to confirm that it is well formed. If there is a schema associated with the xml type, validation is also performed. For more information, see Compare Typed XML to Untyped XML.

XML documents can be encoded with different encodings (for example, UTF-8, UTF-16, windows-1252). The following outlines the rules on how the string and binary source types interact with the XML document encoding and how the parser behaves.

Since nvarchar assumes a two-byte unicode encoding such as UTF-16 or UCS-2, the XML parser will treat the string value as a two-byte Unicode encoded XML document or fragment. This means that the XML document needs to be encoded in a two-byte Unicode encoding as well to be compatible with the source data type. A UTF-16 encoded XML document can have a UTF-16 byte order mark (BOM), but it does not need to, since the context of the source type makes it clear that it can only be a two-byte Unicode encoded document.

The content of a varchar string is treated as a one-byte encoded XML document/fragment by the XML parser. Since the varchar source string has a code page associated, the parser will use that code page for the encoding if no explicit encoding is specified in the XML itself If an XML instance has a BOM or an encoding declaration, the BOM or declaration needs to be consistent with the code page, otherwise the parser will report an error.

The content of varbinary is treated as a codepoint stream that is passed directly to the XML parser. Thus, the XML document or fragment needs to provide the BOM or other encoding information inline. The parser will only look at the stream to determine the encoding. This means that UTF-16 encoded XML needs to provide the UTF-16 BOM and an instance without BOM and without a declaration encoding will be interpreted as UTF-8.

If the encoding of the XML document is not known in advance and the data is passed as string or binary data instead of XML data before casting to XML, it is recommended to treat the data as varbinary. For example, when reading data from an XML file using OpenRowset(), one should specify the data to be read as a varbinary(max) value:

select CAST(x as XML) 
from OpenRowset(BULK 'filename.xml', SINGLE_BLOB) R(x)

SQL Server internally represents XML in an efficient binary representation that uses UTF-16 encoding. User-provided encoding is not preserved, but is considered during the parse process.

Solution:

CONVERT(XML, CONVERT(NVARCHAR(max), ProductXML))
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you ! I am still going through everything you have written.The xml actually changes dynamically . It can have xml info for a single product or multiple products . I just gave an example above. In reality it is a logging table with different logging data (xmls) being stored in the table.
What encoding is the database set for?
I am sorry I am new to this. I ran the following command which I found online to find the database encoding SELECT DATABASEPROPERTYEX('DBName', 'Collation') SQLCollation; and I got 'SQL_Latin1_General_CP1_CI_AS' is this what you asked for ?
Yes, and it seems to be fine. I wonder if your source column has been written in with some type of encoding that is not directly encodable to Unicode ... Would take few minutes to write a C# utility to transfer the data while recognizing the correct encoding. Also, have you tried with just one record?
Dont know how I missed this , but the xml has <?xml version="1.0" encoding="utf-16"?> at the beginning .. now everything you talked about makes sense. its utf-16 encoding. Any guesses on how I can go ahead now ?
|
7

This worked for me:

select CAST(REPLACE(CAST(column3 AS NVARCHAR(MAX)),'utf-8','utf-16') AS XML) from table

1 Comment

You are replacing a string of characters, and not the encoding of a document.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.