1

I am new to AWS Glue and pyspark. I have a table in RDS which contains a varchar field id.
I want to map id to a String field in the output json which is inside a json array field (let's say newId):

{
 "sources" : [
  "newId" : "1234asdf"
 ]
}

How can I achieve this using the transforms defined in the pyspark script of the AWS Glue job.

1
  • So for each row in the table, you want to create a key value pair "newId":"some_value", with some_value being the value in the "id" column of the row? Do you plan to add more fields to the object you are creating for each row? Any reason not to have the collection be the root, rather than sticking the collection under the "sources" collection object? Commented Jul 13, 2021 at 17:08

1 Answer 1

0

Use the AWS Glue Map Transformation to map the string field into a field inside a JSON array in target.

NewFrame= Map.apply(frame=OldFrame, f=map_fields)

and define a function map_fields like such:

def map_fields(rec):
    rec["sources"] = {}
    rec["sources"] = [{"newID": rec["id"]}]
    del rec["id"]
    return rec

Make sure to delete the original field as done in del rec["uid"] otherwise the logic doesn't work.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.