2

I'm trying to convert glue dynamic frame into the spark dataframevusing Dynamicframe.toDF, but I'm getting this exception

Traceback (most recent call last): File "/tmp/ManualJOB", line 62, in df1 = datasource0.toDF() File "/opt/amazon/lib/python3.6/site-packages/awsglue/dynamicframe.py", line 147, in toDF return DataFrame(self._jdf.toDF(self.glue_ctx._jvm.PythonUtils.toSeq(scala_options)), self.glue_ctx) File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call answer, self.gateway_client, self.target_id, self.name) File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/amazon/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o176.toDF. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 15.0 failed 4 times, most recent failure: Lost task 1.3 in stage 15.0 (TID 198, 172.31.0.175, executor 6): com.amazonaws.services.glue.util.FatalException: Unable to parse file: Manual Bound.csv

Can anyone help me with what I am missing?

Thanks in advance!

3
  • Can you confirm if your file Manual Bound.csv is doesn't has characters other than utf-8 ? Glue only supports utf-8 encoding . check your file iconv -f UTF-8 your_file -o /dev/null; echo $? if it has non utf-8 characters? Commented Sep 15, 2020 at 8:10
  • Yes. There were some characters other than utf-8. So that was the problem. Thanks @PrabhakarReddy Commented Sep 15, 2020 at 10:54
  • I have posted the answer. Please mark it as answered if it helped. Commented Sep 15, 2020 at 11:10

1 Answer 1

1

This issue happens when there are characters which are of non UTF-8 encoding.Glue only supports UTF-8 encoding as per this doc.

Text-based data, such as CSVs, must be encoded in UTF-8 for AWS Glue to process it successfully. For more information, see UTF-8 in Wikipedia.

You can verify if your file has invalid characters by running below command which will print them.This is for linux and you can use equivalent if you are using other operating system.

iconv -f UTF-8 your_file -o /dev/null; echo $?

to convert to UTF-8 you can pass the CSV to below command

iconv -f ISO-8859-1 -t UTF-8 file.csv > file-utf8.csv
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.