0

We have a data lake container weith three folders a,b,c. Each folder has 3 files a1,a2,a3,b1,b2,b3,c1,C2,c3. Now we need to design a pipeline which will dynamically do incremental load from the folders to a blob stroarge with same name file as souce. Incremental load is implemented by me in dataflow. We have other dataflow dependancy as well so we can't use copy activity but dataflow. I am unable to integrate get metadata activity with the dataflow where I am expecting some help.

We have a data lake container weith three folders a,b,c. Each folder has 3 I tried with parameters and variables.But I did not got the desired output. I used get metadata child item. Then a foreach loop. Inside foreach I tried with another fireaceach to get the files. I have used an append variable to append the data. I have already implemented the upsert logic for a single table in dataflow. If I am passing second get matadata active output (inside foreach) to dataflow it does not accepts. The main problem I am facing is to integrate the dataflow with foreach in dataset level. Because the dataset of the dataflow will be dependent on get metadata's output.

1
  • Hi, have you got a chance to check the suggestion below? If the answer works for you, you may consider this, thanks. Commented Dec 15, 2022 at 8:36

1 Answer 1

3
Answer recommended by Microsoft Azure Collective

Nested for-each is not possible in Azure data factory. Work around is to use execute pipeline inside for-each activity. To pass the output of metadata activity to dataflow, create the dataflow parameters and pass the value to that parameter. I tried to repro this scene in my environment, below is the approach.

Outer Pipeline:

  • Get Metadata activity is taken and only container name is given in the dataset file path. + New is selected in field list and Child item argument is added. This activity will provide the list of all the directories that are present in the container.

enter image description here

  • For each activity is taken and in items Output of GetMetadata activity is given. @activity('Get Metadata1').output.childItems

enter image description here

  • Inside for-each activity, execute pipeline activity is added.
  • A new child pipeline is created, and a parameter called FolderName is created in that pipeline.
  • The child pipeline name is given in execute pipeline activity. Value for the parameter is given as @item().name, to pass directory names as input to the child pipeline.

enter image description here

Child Pipeline:

  • In child pipeline, another Get meta data activity is taken and in the dataset file path, container name is given and for folder, dataset parameter is created and value of pipeline parameter FolderName is passed. @pipeline().parameters.FolderName

  • Child items is selected as an argument in the field list. This activity will give the list of files that are available in the directory.

enter image description here

  • Then for-each activity is added and in items output of the meta data activity is given. @activity('Get_Metadata_inner').output.childItems

  • Inside for-each, dataflow is added.

Dataflow

  • In dataflow, parameter called filename is created. enter image description here

  • In Source dataset, dataset parameter is created for filename and foldername as fileName and folderName respectively.

gif51

  • Then all other transformations are added in data flow.

  • In sink dataset of sink transformation, dataset parameter for folder is created and file name is left blank in dataset.

enter image description here

  • File name is given in sink settings. Value is the dataflow parameter $filename.

enter image description here

  • In child pipeline, dataflow activity settings is given as in below image. fileName : @item().name folderName (for both source and sink parameter): @pipeline().parameters.FolderName

enter image description here

  • In Parameters tab, filename value is given as @item().name enter image description here

  • In this repro, simple select transformation is taken. This can be extended to any transformation in data flow. By this way, we can pass the values to dataflow.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.