30

I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like

    {
      "job-spec": {
        "base_file_name": "some_s3_key-",
        "part_array": [
          "part-0000.tsv",
          "part-0001.tsv",
          "part-0002.tsv", ...
        ]
      }
    }

My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition

    {
      "Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
      "StartAt": "Map",
      "States": {
        "Map": {
          "Type": "Map",
          "ItemsPath": "$.job-spec",
          "ResultPath": "$.part_array",
          "MaxConcurrency": 2,
          "Next": "Final State",
          "Iterator": {
            "StartAt": "My Stage",
            "States": {
              "My Stage": {
                "Type": "Task",
                "Resource": "arn:aws:states:::lambda:invoke",
                "Parameters": {
                  "FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
                  "Payload": {
                    "Input.$": "$.part_array"
                  }
                },
                "End": true
              }
            }
          }
        },
        "Final State": {
          "Type": "Pass",
          "End": true
        }
      }
    }

As written above it complains that that job-spec is not an array for the ItemsPath. If I change that to $.job-spec.array I get the array I'm looking for in my lambda but the base key is missing.

Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data

It looks like the Parameters value can be used for this but I can't quite get the syntax right

1
  • Looks like there is now a "new" Map State mode overcoming the limitation described in this question. It is the Distributed processing mode of Step Functions Map State : docs.aws.amazon.com/step-functions/latest/dg/… Commented Feb 10, 2023 at 9:54

2 Answers 2

53

Was able to finally get the syntax right.

"ItemsPath": "$.job-spec.part_array",
"Parameters": {
  "part_name.$": "$$.Map.Item.Value",
  "base_file_name.$": "$.job-spec.base_file_name"
},

It seems that Parameters can be used to create custom inputs for each stage. The $$ is accessing the context of the stage and not the actual input. It appears that ItemsPath takes the array and puts it into a context which can be used later.

UPDATE Here is some AWS Documentation showing this being used from the comments below

Sign up to request clarification or add additional context in comments.

4 Comments

Awesome, thank you! In my case, I wanted to pass in a parameter (s3 bucket name) generated in a separate resource. Using the $$.Map.Item.Value and .$ was key for pairing the two when using Input, and then normal Key: Ref! MyResource in the external parameter.
It would be helpful if you could share a blog post or documentation on this, the AWS documentation is pretty thin on the details of this rather complicated means of passing data between states. In my case, I have one SF calling another, and need to pass an array of input to SF 2's Map state, so that each element of the array of input can be passed as a named parameter to a Glue job.
Acccess to $$.Map.Item.Value is demonstrated, but not very thoroughly explained, in the SF documentation docs.aws.amazon.com/step-functions/latest/dg/…
2

This was a real PITA. Here is an example with AWS CDK:

const mapBlock = new sfn.Map(this, "processLoop", {

// Pick your array path
  itemsPath: sfn.JsonPath.stringAt("$.uploadedFiles"),

// Use this to manipulate data going into each loop
  parameters: {

// Now we can use $$.Map.Item.Value to get the current item value
    item: sfn.JsonPath.stringAt("$$.Map.Item.Value"),

// Any additional info you want from the map block input
    collection: sfn.JsonPath.stringAt("$.collectionName"),
    bucket: sfn.JsonPath.stringAt("$.bucket"),
  },
});

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.