I’m working on a data ingestion pipeline using Apache Spark (triggered via a Cloud Function on Dataproc). The input CSV contains column names that include special characters such as parentheses and a decimal point for example:
Collection_time,Nems,Operator,Technology,Vendor,Site,Device No,Device Name,Subunit No,Subunit Name,Online Status,Actual Tilt(0.1degree),Actual Sector ID etc
While processing these columns, I define transformation formulas (stored in a BigQuery config table) such as:
COALESCE(`Actual Tilt(0.1degree)`, NULL)
However, Spark throws a parsing error during job execution:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
[UNRESOLVED_COLUMN.WITH_SUGGESTION]
A column or function parameter with name `Actual Tilt(0`.`1degree)` cannot be resolved.
I also tried various escaping strategies like:
COALESCE([Actual Tilt(0.1degree)], NULL)
COALESCE("Actual Tilt(0.1degree)", NULL)
COALESCE(col("Actual Tilt(0.1degree)"), NULL)
but they fail with errors such as:
java.lang.IllegalArgumentException: Lexer Error: '')' expected but '[' found'
java.util.NoSuchElementException: key not found: col
I can only modify the SQL formula (string) stored in BigQuery — I cannot rename the source CSV column or modify the ingestion code directly.
Hence how can I correctly reference the column Actual Tilt(0.1degree) inside a Spark SQL expression (e.g., COALESCE or SELECT) when I can only change the SQL formula string?
col("`Actual Tilt(0.1degree)`")