1

I've been going round in circles trying to get what I thought would be a relatively trivial pipeline working in Azure Data Factory. I have a CSV file with a schema like this:

Id, Name, Color
1, Apple, Green
2, Lemon, Yellow

I need to transform the CSV into a JSON file that looks like this:

{"fruits":[{"Id":"1","Name":"Apple","Color":"Green"},{"Id":"2","Name":"Lemon","Color":"Yellow"}]

I can't find a simple example that helps me understand how to do this in ADF. I've tried a Copy activity, and a data flow, but the furthest I've got is a json object like this:

{"fruits":{"Id":"1","Name":"Apple","Color":"Green"}}
{"fruits":{"Id":"2","Name":"Lemon","Color":"Yellow"}}

Surely this is simple to achieve. I'd be very grateful if anyone has any suggestions. Thanks!

2
  • It seams to be simple, but per my experience, we can not achieve that. Some others have post same questions and still have no good ideas. Commented Jul 13, 2020 at 5:48
  • Hi Simon, do you mind implement this requirement in other service and call it in ADF ? Commented Jul 15, 2020 at 6:15

2 Answers 2

2

https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#tabularhierarchical-source-to-hierarchical-sink

"When copying data from tabular source to hierarchical sink, writing to array inside object is not supported"

But, if we put file pattern under Sink properties as 'Array of Objects', you can achieve somewhere till here:

    [{"Id":"1","Name":" Apple","Color":" Green"}
     ,{"Id":"2","Name":" Lemon","Color":" Yellow"}
    ]
Sign up to request clarification or add additional context in comments.

Comments

0

Below are the steps to be followed to generate the desired output JSON file.

  1. In the ADF create a DataFlow with the following Transformations
  1. Source
  2. Derived Column
  3. Aggregate
  4. Sink

Data flow

  1. In the Source Transformation select the Source Dataset where the source file is present. Source transformation

  2. In the Deriver Column Transformation, add a column as 'fruit' and 3 sub columns Id, name and Color and map the column names from 'Input Schema' to the respective column name.

Column

sub column

  1. In the Aggregate Transformation, leave the 'Group by' tab as blank and in the 'Aggregates' tab select the column 'fruits' and the expression as collect(fruits)

Aggregate

  1. In the sink transformation select the destination dataset.

Sink

  1. In Sink transformation setting set 'File name option' to 'Output to single file ' and mention the output file name. In mappings tab uncheck 'Auto Mapping'.

sink setting Mapping

  1. Create a pipeline and drag the drop the data flow and run the pipeline. You will get your desired output.

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.