9

when I code the spark sql API hiveContext.sql()

from pyspark import SparkConf,SparkContext
from pyspark.sql import SQLContext,HiveContext

conf = SparkConf().setAppName("spark_sql")

sc = SparkContext(conf = conf)
hc = HiveContext(sc)

#rdd = sc.textFile("test.txt")
sqlContext = SQLContext(sc)
res = hc.sql("use teg_uee_app")
#for each in res.collect():
#    print(each[0])
sc.stop()

I got the following error:

enFile "spark_sql.py", line 23, in <module>
res = hc.sql("use teg_uee_app")
File "/spark/python/pyspark/sql/context.py", line 580, in sql
    return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
File "/spark/python/pyspark/sql/context.py", line 683, in _ssql_ctx
    self._scala_HiveContext = self._get_hive_ctx()
File "/spark/python/pyspark/sql/context.py", line 692, in _get_hive_ctx
return self._jvm.HiveContext(self._jsc.sc())
  TypeError: 'JavaPackage' object is not callable

how do I add SPARK_CLASSPATH or SparkContext.addFile?I don't have idea.

2 Answers 2

4

Maybe this will help you: When using HiveContext I have to add three jars to the spark-submit arguments:

spark-submit --jars /usr/lib/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/spark/lib/datanucleus-core-3.2.10.jar,/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar ...

Of course the paths and versions depend on your cluster setup.

Sign up to request clarification or add additional context in comments.

Comments

1

In my case this turned out to be a classpath issue - I had a Hadoop jar on the classpath that was a wrong version of Hadoop than I was running.

Make sure you only set the executor and/or driver classpaths in one place and that there's no system-wide default applied somewhere such as .bashrc or Spark's conf/spark-env.sh.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.