3

I have a Spark program that's reading from CSV files and loading them into Dataframes. Once loaded, I'm manipulating them using SparkSQL.

When running my Spark job, it fails and gives me the following exception:

org.apache.spark.sql.AnalysisException: cannot resolve 'action' given input columns ["alpha", "beta", "gamma", "delta", "action"]

The exception above is thrown when SparkSQL tries parsing the following:

SELECT *, 
  IF(action = 'A', 1, 0) a_count,
  IF(action = 'B', 1, 0) b_count,
  IF(action = 'C', 1, 0) c_count,
  IF(action = 'D', 1, 0) d_count,
  IF(action = 'E', 1, 0) e_count
FROM my_table

This code worked fine before updating to Spark 2.0. Does anyone have any idea what would cause this issue?

Edit: I'm loading the CSV files using the Databricks CSV parser:

sqlContext.read().format("csv")
    .option("header", "false")
    .option("inferSchema", "false")
    .option("parserLib", "univocity")
    .load(pathToLoad);
3
  • how are you reading from csv ? Commented Oct 4, 2016 at 19:32
  • Hi @ArunakiranNulu, I'm loading the CSV files using the Databricks CSV library. See my edit in original post. Commented Oct 4, 2016 at 19:34
  • Did you ever get an answer to this? Am running into the same situation, and would rather not rename my action column. Commented Nov 2, 2020 at 14:10

3 Answers 3

1

Try adding backquotes to your selection.

SELECT *, 
  IF(`action` = 'A', 1, 0) a_count,
  IF(`action` = 'B', 1, 0) b_count,
  IF(`action` = 'C', 1, 0) c_count,
  IF(`action` = 'D', 1, 0) d_count,
  IF(`action` = 'E', 1, 0) e_count
FROM my_table

This applies to some databases like MySQL as well.

Sign up to request clarification or add additional context in comments.

1 Comment

I had an issue in spark-shell parsing XML where the fields contains punctuation (.) and where the following df.select("JOURNPOST.OJ").show() did fail but df.select("JOURNPOST.OJ").show() worked.
0

In Spark 2.0 in built CSV support has been added , try like below.

spark.read.format("csv").option("header","false").load("../path_to_file/file.csv")
spark.read.option("header", "false").csv("../path_to_file/file.csv")

3 Comments

Thanks for the suggestion. I've changed my code, but still get the same error.
where are you testing this ? are you trying through spark shell or spark-submit in local or stand alone or YARN or Mesos ?
Trying through spark-submit
0

I used 2.0 in my cluster while the code was in 2.3 and i was facing the same issue, i got rid of it by using appropriate spark version during run-time.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.