Cast a binary column to xml spark sql

Question

I have a binary column in a dataframe that I wish to cast as xml, I created a temp table using

sourceDf = spark.read.csv(sourceFilePath, sep=',', header=True, inferSchema=True).createOrReplaceTempView("sourceTable")

And want to run this sql query, which works perfectly in sql server but not in databricks

%sql
SELECT ID 
  ,ORIGINATOR_ID
  ,INCIDENT_NUMBER 
  ,ATTACHMENT_TYPE
  ,FORM_NAME
  ,FORM_DATA
  ,CAST( CAST( FORM_DATA as XML ).value('.','varbinary(max)') AS nvarchar(max) )
  ,START_DATE
  ,END_DATE
  ,OPERATOR_ID 
  FROM sourceTable

I get the following error:

Error in SQL statement: ParseException: 
no viable alternative at input 'CAST( CAST( FORM_DATA as XML ).value('(line 7, pos 39)

Cany anyone help? If I go back to the source system I can run the same query in SQL server and it works perfectly, but I need to be able to cast to xml within a notebook to then allow me to parse the xml.

Alex Ott · Accepted Answer · 2021-10-16 09:55:16Z

1

There is no such thing in Apache Spark as a separate XML type - you can cast only to string type, and from which you can try to parse it as a XML. And after you did this, just follow instructions of the spark-xml library on how to parse XML embedded as a column by using from_xml function (I specially don't want to duplicate code from documentation because it's quite lengthy for PySpark.)

answered Oct 16, 2021 at 9:55

Alex Ott

88.1k10 gold badges110 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Cast a binary column to xml spark sql

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related