0

I am attempting to parse an xml column within a data table on SQL Server, converting the contents into new columns within the dataframe I am trying to create. I keep getting the error

Msg 9420, Level 16, State 1, Line 1
XML parsing: line 20, character 2005, illegal xml character

and I don't know how to resolve this. This illegal character does not exist in every row's xml column.

My SQL code was able to parse 570,000 rows before it hit a row with an illegal character and stopped running. My WHERE clause is suppose to parse and pull 1,200,000 rows. Thus the code was able to successfully parse just under half of the needed rows before quitting. The xml column is stored as a varchar so I do need to CAST to xml in order to parse content.

This SQL code does work. It works on the raw data which contains a mix of production data and fake testing data. I was able to get access to the production only table and it was with this table that I encountered the error. Something must have happened to the data when it was transferred to the production only table.

I tried searching posts for something that could help, but I couldn't find anything. I don't know how to locate the error within the 1.2M records I am working with or which of the parsed columns is causing the problem. Is there a way for the parsing algorithm to skip over offending rows and continue to parse the remaining records?

My code is:

SELECT [Id]
      ,[EventDateTime]
      ,[TenantId]
      ,[EventType]
      ,[EventXml]
      ,[InsertDateTime]
      ,[AppInstanceId]
      ,[TokenCorrelationId]
      ,[AuditCorrelationId]
      ,[AuditId]
      ,CAST([EventXml] as XML).value('/PrescriptionEvent [1]/DateTimeStamp[1]','NVARCHAR(max)') AS xml_DateTimeStamp 
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/AuditCorrelationId[1]','NVARCHAR(max)')) AS xml_AuditCorrelationId
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/TokenCorrelationId[1]','NVARCHAR(max)')) AS xml_TokenCorrelationId
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/ActingUserId[1]/Value[1]','NVARCHAR(max)')) AS xml_ActingUserId
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/ActingUserId[1]/LegacyId[1]','NVARCHAR(max)')) AS xml_ActingUserId_LegacyId
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/TenantId[1]/Value[1]','NVARCHAR(max)')) AS xml_TenantId
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/TenantId[1]/LegacyId[1]','NVARCHAR(max)')) AS xml_TenantId_LegacyId
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/AppInstanceId[1]/Value[1]','NVARCHAR(max)')) AS xml_AppInstanceId
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/AppInstanceId[1]/LegacyId[1]','NVARCHAR(max)')) AS xml_AppInstanceId_LegacyId
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/ActionType[1]','NVARCHAR(max)')) AS xml_ActionType
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/Outcome[1]','NVARCHAR(max)')) AS xml_Outcome
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/OutcomeReason[1]','NVARCHAR(max)')) AS xml_OutcomeReason
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/RxSigningWorkflowActivity[1]','NVARCHAR(max)')) AS xml_RxSigningWorkflowActivity
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/Waypoint[1]','NVARCHAR(max)')) AS xml_Waypoint
      ,UPPER(CAST([EventXml] as XML).value('/PrescriptionEvent[1]/PrescriptionReferenceId[1]','NVARCHAR(max)')) AS xml_PrescriptionReferenceId
  FROM [EpcsAuditDB].[dbo].[EpcsAuditEventData]
  WHERE [EventType] = 4 AND [EventDateTime] >= '2020-03-24'

example of xml (this one does not have the illegal character; don't know how to find one that does contain an illegal character):

<?xml version="1.0" encoding="utf-8"?>  <PrescriptionEvent xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">    <DateTimeStamp>2020-03-24T19:54:33.0169582Z</DateTimeStamp>    <Outcome>true</Outcome>    <OutcomeReason />    <AuditCorrelationId>3a4fb1cd-c39c-4e84-bfc4-dee98b29be2e</AuditCorrelationId>    <TokenCorrelationId>d80bbd23-2e1d-44b3-9452-972b54f35cc9</TokenCorrelationId>    <ActingUserId>      <Value>91f78a00-ce26-4088-88eb-11x5565910d7</Value>    </ActingUserId>    <TenantId>      <Value>00000000-0000-0000-0000-000000000000</Value>      <LegacyId>10051804</LegacyId>    </TenantId>    <AppInstanceId>      <Value>00000000-0000-0000-0000-000000000000</Value>      <LegacyId>Hospital</LegacyId>    </AppInstanceId>    <PrescriptionReferenceId>ecf5fd42-096e-ea11-a852-005056a9ea50</PrescriptionReferenceId>    <AdditionalPrescriptionReferenceId />    <ActionType>Received</ActionType>    <RxSigningWorkflowActivity>RxArchive</RxSigningWorkflowActivity>    <Waypoint>SMS</Waypoint>  </PrescriptionEvent>
1
  • The first thing you want to do is use a CTE so that you only do the cast to XML, and the extraction of the first PrescriptionEvent node once. That will simplify things and probably speed it up somewhat. Commented Jun 29, 2020 at 4:47

2 Answers 2

3

The error is not caused by a column in your XML, it is because the XML is invalid. Its being thrown by the cast to XML.

Depending on your version of sql server, you should be able to find error rows by:

select EventXml 
from [EpcsAuditDB].[dbo].[EpcsAuditEventData]
where try_cast([EventXml] as XML) is null
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for this and the CTE suggestion. This was very helpful, @TomC
1

You can use TRY_CONVERT to see the data which is not valid xml content. Below POC code will be helpful.

DECLARE @tableWithxml table(id int, xmlcontent varchar(500))

INSERT INTO @tableWithxml
values (1,'<x> 1</x>'), (2,'<x 1</x>')

SELECT id, xmlcontent
from
(SELECT id, xmlcontent, try_convert(xml,xmlcontent) as conversionsucceed
from @tableWithxml) as t
where conversionsucceed is null -- failed conversion
+----+------------+
| id | xmlcontent |
+----+------------+
|  2 | <x 1</x>   |
+----+------------+

1 Comment

Thank you. This was very helpful

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.