org.apache.spark.sql.AnalysisException: cannot resolve given input column

Question

I have a Spark program that's reading from CSV files and loading them into Dataframes. Once loaded, I'm manipulating them using SparkSQL.

When running my Spark job, it fails and gives me the following exception:

org.apache.spark.sql.AnalysisException: cannot resolve 'action' given input columns ["alpha", "beta", "gamma", "delta", "action"]

The exception above is thrown when SparkSQL tries parsing the following:

SELECT *, 
  IF(action = 'A', 1, 0) a_count,
  IF(action = 'B', 1, 0) b_count,
  IF(action = 'C', 1, 0) c_count,
  IF(action = 'D', 1, 0) d_count,
  IF(action = 'E', 1, 0) e_count
FROM my_table

This code worked fine before updating to Spark 2.0. Does anyone have any idea what would cause this issue?

Edit: I'm loading the CSV files using the Databricks CSV parser:

sqlContext.read().format("csv")
    .option("header", "false")
    .option("inferSchema", "false")
    .option("parserLib", "univocity")
    .load(pathToLoad);

Hi @ArunakiranNulu, I'm loading the CSV files using the Databricks CSV library. See my edit in original post. — dmux
– dmux, Commented Oct 4, 2016 at 19:34
Did you ever get an answer to this? Am running into the same situation, and would rather not rename my action column. — Scott
– Scott, Commented Nov 2, 2020 at 14:10

xmar · Accepted Answer · 2017-11-28 16:35:43Z

1

Try adding backquotes to your selection.

SELECT *, 
  IF(`action` = 'A', 1, 0) a_count,
  IF(`action` = 'B', 1, 0) b_count,
  IF(`action` = 'C', 1, 0) c_count,
  IF(`action` = 'D', 1, 0) d_count,
  IF(`action` = 'E', 1, 0) e_count
FROM my_table

This applies to some databases like MySQL as well.

answered Nov 28, 2017 at 16:35

xmar

1,84929 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

kometen Over a year ago

I had an issue in spark-shell parsing XML where the fields contains punctuation (.) and where the following df.select("JOURNPOST.OJ").show() did fail but df.select("JOURNPOST.OJ").show() worked.

Arunakiran Nulu · Accepted Answer · 2016-10-04 19:44:43Z

0

In Spark 2.0 in built CSV support has been added , try like below.

spark.read.format("csv").option("header","false").load("../path_to_file/file.csv")
spark.read.option("header", "false").csv("../path_to_file/file.csv")

answered Oct 4, 2016 at 19:44

Arunakiran Nulu

2,0991 gold badge12 silver badges16 bronze badges

3 Comments

dmux Over a year ago

Thanks for the suggestion. I've changed my code, but still get the same error.

Arunakiran Nulu Over a year ago

where are you testing this ? are you trying through spark shell or spark-submit in local or stand alone or YARN or Mesos ?

dmux Over a year ago

Trying through spark-submit

Arun Mohan · Accepted Answer · 2019-03-13 08:35:25Z

0

I used 2.0 in my cluster while the code was in 2.3 and i was facing the same issue, i got rid of it by using appropriate spark version during run-time.

answered Mar 13, 2019 at 8:35

Arun Mohan

3471 gold badge5 silver badges15 bronze badges

Collectives™ on Stack Overflow

org.apache.spark.sql.AnalysisException: cannot resolve given input column

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related