0

Intro

I have a docker configured with Glue ETL PySpark environment, thanks to this AWS Glue tutorial. I used the "hellowrold.py":

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

glueContext = GlueContext(SparkContext.getOrCreate())

medicare = spark.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load('s3://awsglue-datasets/examples/medicare/Medicare_Hospital_Provider.csv')
medicare.printSchema()

I cannot run it doing spark-submit hellowrold.py because I'm faced to well known error :

ModuleNotFoundError: No module named 'dynamicframe'

I found a hack: using the redirection operator: pyspark < helloworld.py and it works like a charm.

My problem

HOWEVER. Now I need to pass some arguments to my script.

I used to (before trying to use Glue ETL) use : spark-submit myScript.py arg1 arg2 arg3

When I tried naively to do pyspark < myScript.py arg1 arg2 arg3 I got the following error:

Error: pyspark does not support any application options.

Minimal myScript.py to reproduce

import sys
from pyspark import SparkContext
from awsglue.context import GlueContext

# Hello world
glueContext = GlueContext(SparkContext.getOrCreate())
print(sys.argv[1] + " " + sys.argv[2] + " " + sys.argv[3])

Is there any solution to continue to use pyspark instead of spark-submit using some arguments?

Am I totally wrong, and is there a solution that can use spark-submit with Glue?

1 Answer 1

1

I would advise you to use the integration with PyCharm if possible. There you don't have the module error and you can inject arguments through the parameter option of the PyCharm run configuration.

The article that you linked also explains how to integrate with PyCharm.

Edit:

When I log into the Docker container and just run:

/home/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/bin/spark-submit myScript.py test1, test2, test3

it prints out test1 test2 test3. I copied the exact content from your script. Could you please try that?

Sign up to request clarification or add additional context in comments.

4 Comments

Yes if you pay for PyCharm Professional, which I don't. And the problem still remains the same.
Can you provide your whole script? I don't have a problem running scripts with spark-submit
I edited and added the minimal myScript.py to reproduce
I added to my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.