I have a Spark program that's reading from CSV files and loading them into Dataframes. Once loaded, I'm manipulating them using SparkSQL.
When running my Spark job, it fails and gives me the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve 'action' given input columns ["alpha", "beta", "gamma", "delta", "action"]
The exception above is thrown when SparkSQL tries parsing the following:
SELECT *,
IF(action = 'A', 1, 0) a_count,
IF(action = 'B', 1, 0) b_count,
IF(action = 'C', 1, 0) c_count,
IF(action = 'D', 1, 0) d_count,
IF(action = 'E', 1, 0) e_count
FROM my_table
This code worked fine before updating to Spark 2.0. Does anyone have any idea what would cause this issue?
Edit: I'm loading the CSV files using the Databricks CSV parser:
sqlContext.read().format("csv")
.option("header", "false")
.option("inferSchema", "false")
.option("parserLib", "univocity")
.load(pathToLoad);