0

I have a binary column in a dataframe that I wish to cast as xml, I created a temp table using

sourceDf = spark.read.csv(sourceFilePath, sep=',', header=True, inferSchema=True).createOrReplaceTempView("sourceTable")

And want to run this sql query, which works perfectly in sql server but not in databricks

%sql
SELECT ID 
  ,ORIGINATOR_ID
  ,INCIDENT_NUMBER 
  ,ATTACHMENT_TYPE
  ,FORM_NAME
  ,FORM_DATA
  ,CAST( CAST( FORM_DATA as XML ).value('.','varbinary(max)') AS nvarchar(max) )
  ,START_DATE
  ,END_DATE
  ,OPERATOR_ID 
  FROM sourceTable

I get the following error:

Error in SQL statement: ParseException: 
no viable alternative at input 'CAST( CAST( FORM_DATA as XML ).value('(line 7, pos 39)

Cany anyone help? If I go back to the source system I can run the same query in SQL server and it works perfectly, but I need to be able to cast to xml within a notebook to then allow me to parse the xml.

1 Answer 1

1

There is no such thing in Apache Spark as a separate XML type - you can cast only to string type, and from which you can try to parse it as a XML. And after you did this, just follow instructions of the spark-xml library on how to parse XML embedded as a column by using from_xml function (I specially don't want to duplicate code from documentation because it's quite lengthy for PySpark.)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.