2

I have a query that generates XML files and loads them to FTP with <?xml version="1.0"?>.

I need to switch encoding to UTF-8 as follows:

<?xml version="1.0" encoding="utf-8"?> 

I can do it manually in the text editor. But cannot do it in SQL Server.

I also read this article but that did not contribute in finding solution for the issue.

https://learn.microsoft.com/en-us/sql/relational-databases/xml/create-instances-of-xml-data

enter image description here

enter image description here

My code:

USE [Audit_DBA]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

-- to call the procedure use the code below but assign your own path: 
-- EXEC [Audit_DBA].[dbo].[StateAndLocalAward_ToXML] 'C:\NLP\Test\NewFolder'

CREATE PROCEDURE [dbo].[StateAndLocalAward_ToXML_VC]
    @FileDestinationDir VARCHAR(2000)
AS 
    DECLARE  @FilePath VARCHAR(4000)

    DECLARE @SQLStr VARCHAR(8000),
            @Cmd VARCHAR(1000),
            @Ret INT

    IF EXISTS (SELECT * FROM InputTemp.SYS.TABLES WHERE NAME = 'StateAndLocalAward_output') 
       DROP TABLE InputTemp.dbo.StateAndLocalAward_output;

    ;WITH CTE_StateAndLocalAward_output AS 
    (
        SELECT TOP 200
            sc.stateContractId,
            ca.OnviaId AS [Reference], --AS OnviaID,
            ca.contractTitle AS [Title], 
            nigp.NIGPCodeTx AS [NIPGCode],
            nigp.NIGPDescriptionLongTx AS [NIPGDescription],
            ca.Description,
            vm.parentName AS [VendorName],
            offering.offeringTx AS [PrimaryOffering],
            ca.StartDate,
            ca.AwardDt
        FROM 
            ISCore.dbo.StateContract sc WITH (NOLOCK)
        INNER JOIN 
            ISCore.dbo.ContractAward ca WITH (NOLOCK) ON sc.contractAwardId = ca.contractAwardID
        INNER JOIN 
            ISLibrary.dbo.VendorMaster vm WITH (NOLOCK) ON ca.vendorId = vm.vendorID
        LEFT OUTER JOIN 
            ISCore.dbo.CompanyProfile cp WITH (NOLOCK) ON ca.vendorId = cp.vendorId
        LEFT OUTER JOIN 
            ISCore.dbo.Offering offering WITH (NOLOCK) ON ca.offeringID = offering.offeringID
        LEFT OUTER JOIN 
            ISCore.dbo.NIGPSrvc nigp WITH (NOLOCK) ON ca.NIGPCode = nigp.NIGPCodeTx
        WHERE 
            vm.showUnverified = 1 AND sc.stateContractId = -- 464482 stateContractId
            AND StartDate >= DATEADD(month, -12, GETDATE())
    )
    SELECT 
        *
    INTO 
        InputTemp.dbo.StateAndLocalAward_output 
    FROM 
        CTE_StateAndLocalAward_output;

    --select * from InputTemp.dbo.StateAndLocalAward_output

    DECLARE @StateContractId INT;
    --DECLARE @xml XML;

    DECLARE Cur_StateAndLocalAward_StateContractId CURSOR FOR
        SELECT StateContractId
        FROM inputtemp.dbo.StateAndLocalAward_output t1 WITH (NOLOCK)

    OPEN Cur_StateAndLocalAward_StateContractId;

    FETCH NEXT FROM Cur_StateAndLocalAward_StateContractId INTO @StateContractId -- @xml;

    WHILE @@FETCH_STATUS = 0 
    BEGIN
        SELECT 
            @SQLStr = 'SELECT Body.[Reference], Body.[Title], Body.[NIPGCode], Body.[NIPGDescription], Body.[Description], Body.[VendorName], Body.[PrimaryOffering] FROM InputTemp.dbo.StateAndLocalAward_output AS Body where StateContractId = ''' + str(@StateContractId) + ''' FOR XML AUTO, ELEMENTS;'

        --SELECT N'<?xml version="1.0" encoding="UTF-8"?>' 
        SELECT
            CAST((SELECT N'<?xml version="1.0" encoding="UTF-8"?>' + (@SQLSTr)) as varchar(8000) /*as XML*/) as SQLStr 

        -- select CAST((SELECT N'<?xml version="1.0" encoding="UTF-8"?>' + 
        --  (@SQLSTr)) as XML) as SQLStr 

If I run this code instead of the same above but cast as XML type, I get an error:

Msg 9402, Level 16, State 1, Procedure StateAndLocalAward_ToXML, Line 86 [Batch Start Line 10]
XML parsing: line 1, character 38, unable to switch the encoding

select @FilePath = @FileDestinationDir+'\NewFolder'+ltrim(rtrim(str(@StateContractId)))+'.xml' 
select @Cmd = ' bcp " ' + @SQLStr + '" queryout '+@FilePath+' -w -r "" -T -S ' +@@ServerName 
exec  @Ret = master.dbo.xp_cmdshell @Cmd 
FETCH NEXT FROM Cur_StateAndLocalAward_StateContractId INTO @StateContractId -- @xml;
END
CLOSE Cur_StateAndLocalAward_StateContractId;
DEALLOCATE Cur_StateAndLocalAward_StateContractId;
GO

2 Answers 2

3

There are some things to know:

  • SQL Server does not support export via BCP to UTF-8 before version 2016 (and 2014 with SP2).
  • One cannot add the xml-declaration (<?xml blah ?>) to a native SQL-Server XML typed variable or column. This will either fail ("...switch the encoding") or the xml-declaration will disappear.
  • You can add the xml-declaration on string level to an xml casted to NVARCHAR(MAX). But you cannot re-cast (re-convert) this to an XML without failing or losing the declaration.
  • Internally SQL-Server keeps any XML as UCS-2 (very close to UTF-16) in any case.
  • SQL-Servers VARCHAR (CHAR) type is not utf-8 but extended ASCII (depending on a COLLATION)
  • on string level you can write literally anything into the xml-declaration (as you can creat something, which looks like XML, but is not well-formed. This is just an unchecked string.
  • The encoding stated in the xml-declaration is important only to mark the actual file encoding when written to a disk or when handled as byte stream.
  • You can write encoding="x" and store the file with an encoding of y - but you shouldn't.
  • SQL-Server will cast a string with an utf-8 declaration to XML when it is VARCHAR and it will cast a string with utf-16 when it is NVARCHAR, but you cannot cross this (Read this related answer). Other encodings very likely lead to cannot switch the encoding error.

About your code

  • You should change @SQLStr and @cmd to NVARCHAR(MAX), othewise you might get in troubles with non-plain-latin characters.
  • As you are using a CURSOR, you should fill an XML-typed variable with the result of your statement, cast this to NVARCHAR(MAX) and add the declaration to this string. Do not cast the result back to XML.
  • Read the BCP docs. Stating -w will write unicode (wide), which is not utf-8 (what you write into the declaration has no effect here).

Hint:

Read this related answer, showing utf-8 export with BCP on SQL-Server 2016

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks so much Shnugo. I did change my code as per your recommendation - SQLStr nvarchar(max) and Cmd nvarchar(4000) on the variable declaration level and I also needed to switch Database Compatibility level to 2016. And the generated XML files switch encoding to UTF-8 when I opened them in Notepad++ and looked up for Encoding. But when I opened the XML files in Internet Explorer the XML displayed <?xml version="1.0"?> and not <?xml version="1.0" encoding="utf-8"?>
Is there any way to literally specify <?xml version="1.0" encoding="utf-8"?> before the XML file structure begins?
@enigma6205, No SQL Server will not do this for you. But you can easily concatenate the casted XML with any string you want. Use SET @content=N'<?xml whatever ?>' + CAST(@theXml AS NVARCHAR(MAX)) and write this out.
0
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


CREATE PROCEDURE [dbo].[MyXMLTest]
@FileDestinationDir VARCHAR(2000)

-- to call procedure specify your own file path 
-- EXEC [Audit_DBA].[dbo].[MyXMLTest] 'E:\NLP\GovwinIQ_Ontology\NewFolder'

AS 

SET QUOTED_IDENTIFIER ON

IF OBJECT_ID (N'InputTemp.dbo.XMLTest', N'U') IS NOT NULL
DROP TABLE InputTemp.dbo.XMLTest;

CREATE TABLE InputTemp.dbo.XMLTest

(
[Id] INT NOT NULL,
[FirstName] VARCHAR(100) NOT NULL,
[LastName] VARCHAR(100) NOT NULL,
[Address] VARCHAR(100) NOT NULL
);

INSERT INTO InputTemp.dbo.XMLTest ([Id], [FirstName], [LastName], [Address])
VALUES (12, 'Zhuk', 'Termik', '123 Gam Str, Boston, NY');

--SELECT * FROM InputTemp.dbo.XMLTest

DECLARE @FilePath VARCHAR(4000)

DECLARE @SQLStr NVARCHAR(4000),
        @Cmd NVARCHAR(4000),
        @Ret INT

DECLARE @Id INT;

SELECT @Id = 12;

SELECT @SQLStr = 
'SELECT N''<?xml version=''''1.0'''' encoding=''''UTF-8''''?>'' + (SELECT CAST((SELECT [Id], [FirstName], [LastName], [Address] FROM InputTemp.dbo.XMLTest AS Body WHERE Id = '''  + str(@Id) + ''' FOR XML AUTO, ELEMENTS) AS NVARCHAR(MAX)))'

SELECT @SQLStr AS SQLStr

SELECT @FilePath = @FileDestinationDir+'\NewFolder'+ltrim(rtrim(str(@Id)))+'.xml' 

SELECT @Cmd = ' bcp " ' + @SQLStr + '" queryout '+@FilePath+' -c  -C65001 -r "" -T -S ' +@@ServerName 

EXEC @Ret = master.dbo.xp_cmdshell @Cmd 

IF OBJECT_ID (N'InputTemp.dbo.XMLTest', N'U') IS NOT NULL
DROP TABLE InputTemp.dbo.XMLTest;

GO

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.