How to reference a CSV column with parentheses and a decimal point in Spark SQL or COALESCE expression?

Question

I’m working on a data ingestion pipeline using Apache Spark (triggered via a Cloud Function on Dataproc). The input CSV contains column names that include special characters such as parentheses and a decimal point for example:

Collection_time,Nems,Operator,Technology,Vendor,Site,Device No,Device Name,Subunit No,Subunit Name,Online Status,Actual Tilt(0.1degree),Actual Sector ID etc

While processing these columns, I define transformation formulas (stored in a BigQuery config table) such as:

COALESCE(`Actual Tilt(0.1degree)`, NULL)

However, Spark throws a parsing error during job execution:

Exception in thread "main" org.apache.spark.sql.AnalysisException:
[UNRESOLVED_COLUMN.WITH_SUGGESTION]
A column or function parameter with name `Actual Tilt(0`.`1degree)` cannot be resolved.

I also tried various escaping strategies like:

COALESCE([Actual Tilt(0.1degree)], NULL)
COALESCE("Actual Tilt(0.1degree)", NULL)
COALESCE(col("Actual Tilt(0.1degree)"), NULL)

but they fail with errors such as:

java.lang.IllegalArgumentException: Lexer Error: '')' expected but '[' found'
java.util.NoSuchElementException: key not found: col

I can only modify the SQL formula (string) stored in BigQuery — I cannot rename the source CSV column or modify the ingestion code directly.

Hence how can I correctly reference the column Actual Tilt(0.1degree) inside a Spark SQL expression (e.g., COALESCE or SELECT) when I can only change the SQL formula string?

This question is similar to: How to escape column names with hyphen in Spark SQL. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Error message also kind of gives a hint where it uses backticks when referring to a "missing" column. — mazaneicha
– mazaneicha, Commented Nov 3 at 21:59
@Suhani Bhatia -> What is the output for df.columns after you reading the dataframe — pcbzmani
– pcbzmani, Commented Nov 4 at 12:06
Double quotes with backticks - col("`Actual Tilt(0.1degree)`") — Andrew
– Andrew, Commented Nov 4 at 18:57

Frank · Accepted Answer · 2025-11-04 00:23:26Z

0

Rename the col first:

df = df.withColumnRenamed("Tilt(0.1degree)", "Tiltdegree")

answered Nov 4 at 0:23

Frank

6366 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Suhani Bhatia Nov 4 at 4:36

Cannot rename the column due to some constraints any other way to handle this?

Frank Nov 4 at 12:05

Use Polars or DuckDB instead?

Suhani Bhatia Nov 4 at 18:25

cast(case when Actual Tilt(0.1degree) = 'Invalid' then 0 else Actual Tilt(0.1degree) end as int) ---->will this work?

Collectives™ on Stack Overflow

How to reference a CSV column with parentheses and a decimal point in Spark SQL or COALESCE expression?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related