3

How can we write user-defined functions in AWS-Glue script using PySpark (Python) on either Dynamic-frame or Data-frame?

2 Answers 2

2

dynamicframe doesn't support a UDF exactly the way the Dataframe API supports it. The best you will get is the MAP.apply.

Sign up to request clarification or add additional context in comments.

Comments

-1

"AWS Glue does not yet directly support Lambda functions, also known as user-defined functions. But you can always convert a DynamicFrame to and from an Apache Spark DataFrame to take advantage of Spark functionality in addition to the special features of DynamicFrames." - AWS Glue Medicaid Python samples

The AWS Glue Medicaid Python samples (quoted/linked above) include a Spark UDF example:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

chop_f = udf(lambda x: x[1:], StringType())
medicare_dataframe = medicare_dataframe.withColumn(
        "ACC", chop_f(
            medicare_dataframe["average covered charges"])).withColumn(
                "ATP", chop_f(
                    medicare_dataframe["average total payments"])).withColumn(
                        "AMP", chop_f(
                            medicare_dataframe["average medicare payments"]))
medicare_dataframe.select(['ACC', 'ATP', 'AMP']).show()

This is just standard Spark code. If you're looking to use Spark SQL, see this databricks example.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.