10

I am trying to setup AWS Glue environment on my ubuntu Virtual box by following AWS documentation.

I have done the needful like downloading aws glue libs, spark package and setting up spark home as suggested. After that, i am not able to initialize glue context and facing below error.

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glueContext = GlueContext(SparkContext.getOrCreate())
or 
glueContext = GlueContext(sc)

Error:

TypeError          Traceback (most recent call last)
<ipython-input-15-0798793d4033> in <module>
----> 1 glueContext = GlueContext(SparkContext.getOrCreate())

~/aws-glue-libs-glue-1.0/PyGlue.zip/awsglue/context.py in __init__(self, sparkContext, **options)
     43         super(GlueContext, self).__init__(sparkContext)
     44         register(sparkContext)
---> 45         self._glue_scala_context = self._get_glue_scala_context(**options)
     46         self.create_dynamic_frame = DynamicFrameReader(self)
     47         self.write_dynamic_frame = DynamicFrameWriter(self)

~/aws-glue-libs-glue-1.0/PyGlue.zip/awsglue/context.py in _get_glue_scala_context(self, **options)
     64 
     65         if min_partitions is None:
---> 66             return self._jvm.GlueContext(self._jsc.sc())
     67         else:
     68             return self._jvm.GlueContext(self._jsc.sc(), min_partitions, target_partitions)

TypeError: 'JavaPackage' object is not callable
4

2 Answers 2

4

Copy aws-glue-libs jar files to Spark Jar folder. it means copy jar files from \aws-glue-libs\jarsv1\ folder to \spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8\jars folder

Sign up to request clarification or add additional context in comments.

2 Comments

Where to find the jars when these packages are installed within virtualenv ?
I found the glue etl jars here but still unable to find other glue jars like assembly and others us-east-1.console.aws.amazon.com/s3/buckets/…
1

After implementing the instructions as per given in the URL(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html).

Check if spark.executor.extraClassPath and spark.driver.extraClassPath env variables are set to {user_path}\\aws-glue-libs-glue-{1.0/master}\\jarsv1\\*

To verify the classpaths execute the below code:

from pyspark.context import SparkContext

sc = SparkContext()
sc.getConf().getAll()

Given error is coming mainly due to the classpath issue that pointing to AWS related jar files.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.