1

While working on nested json file using DyamicFrame for struct type of data. When i run the jobs its getting this error

Py4JJavaError: An error occurred while calling o67.getDynamicFrame. java.lang.reflect.InvocationTargetException.Let me know where i'm making mistake Any idea about this

Below is my code in GLUE JOB

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame, DynamicFrameReader,             
DynamicFrameWriter, DynamicFrameCollection
from pyspark.sql.functions import lit
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "experimentdb", table_name = "experiment",         
transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database =     
"experimentdb", table_name = "experiment", transformation_ctx = 
"datasource0")
## @type: ApplyMapping
## @args: [mapping = [("id", "string", "id", "string"), ("identifier", 
"string", "identifier", "string"), ("session_count", "long", 
"session_count", "long"), ("language", "string", "language", "string"),     
("timezone", "long", "timezone", "long"), ("game_version", "string", 
"game_version", "string"), ("device_os", "string", "device_os", "string"), 
("device_type", "long", "device_type", "long"), ("device_model", "string", 
"device_model", "string"), ("ad_id", "string", "ad_id", "string"), 
("tags.phone_number", "string", "`tags.phone_number`", "string"), 
("tags.real_name", "string", "`tags.real_name`", "string"), ("tags.email",     
"string", "`tags.email`", "string"), ("tags.onboardingStatus", "string", 
"`tags.onboardingStatus`", "string"), ("tags.dfuStatus", "string", 
"`tags.dfuStatus`", "string"), ("tags.activityStatus", "string", 
"`tags.activityStatus`", "string"), ("tags.lastOperationPerformed", 
"string", "`tags.lastOperationPerformed`", "string"), ("last_active", 
"string", "last_active", "string"), ("playtime", "long", "playtime", 
"long"), ("amount_spent", "double", "amount_spent", "double"), 
("created_at", "string", "created_at", "string"), ("invalid_identifier", 
"string", "invalid_identifier", "string"), ("badge_count", "long", 
"badge_count", "long")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource0]
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("id", 
"string", "id", "string"), ("identifier", "string", "identifier", 
"string"), ("session_count", "long", "session_count", "long"), ("language", 
"string", "language", "string"), ("timezone", "long", "timezone", "long"), 
("game_version", "string", "game_version", "string"), ("device_os", 
"string", "device_os", "string"), ("device_type", "long", "device_type", 
"long"), ("device_model", "string", "device_model", "string"), ("ad_id", 
"string", "ad_id", "string"), ("tags.phone_number", "string", 
"`tags.phone_number`", "string"), ("tags.real_name", "string", 
"`tags.real_name`", "string"), ("tags.email", "string", "`tags.email`", 
"string"), ("tags.onboardingStatus", "string", "`tags.onboardingStatus`", 
"string"), ("tags.dfuStatus", "string", "`tags.dfuStatus`", "string"),     
("tags.activityStatus", "string", "`tags.activityStatus`", "string"), 
("tags.lastOperationPerformed", "string", "`tags.lastOperationPerformed`", 
"string"), ("last_active", "string", "last_active", "string"), ("playtime", 
"long", "playtime", "long"), ("amount_spent", "double", "amount_spent", 
"double"), ("created_at", "string", "created_at", "string"), 
("invalid_identifier", "string", "invalid_identifier", "string"), 
("badge_count", "long", "badge_count", "long")], transformation_ctx = 
"applymapping1")
## @type: DataSink
## @args: [connection_type = "s3", connection_options = {"path": 
"s3://output_data"}, format = "csv", transformation_ctx 
= "datasink2"]
## @return: datasink2
## @inputs: [frame = applymapping1]
datasink2 = glueContext.write_dynamic_frame.from_options(frame = 
applymapping1, connection_type = "s3", connection_options = {"path": 
"s3://output_data"}, format = "csv", transformation_ctx 
= "datasink2")
job.commit()

error logs error logs1

10
  • Could you please provide your code and the line where the error occurs? This error message doesn't provide any information Commented May 3, 2021 at 14:52
  • i have added the code Commented May 3, 2021 at 15:15
  • And in which line does the error occur? Commented May 3, 2021 at 15:16
  • When glue job runs every time .It comes up with above error .It doesn't shows the line error check if this helps ERROR [main] glue.ProcessLauncher (Logging.scala:logError(70)): Exception in User Class: java.lang.reflect.UndeclaredThrowableException Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult Commented May 3, 2021 at 15:24
  • It always shows the line, please paste the complete stack trace. Commented May 3, 2021 at 15:27

1 Answer 1

1

It seems like you are getting a connection error. Since S3 is the only data source you are using and since you didn't create a VPC S3 endpoint, I suspect that this is the problem.

Unfortunately are Glue error logs not really informational so one can only assume. I would ask you to create a VPC S3 Endpoint and try it again.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.