1

I have a json feed in the below format. I need to update the data in NoSQL collection having a different schema as shown below. Using Azure data factory how can I transform input json schema to target schema?

Since the currentValue can be of different data type(array, number, complex type, string etc) for each record, Azure Data flow task is giving null value for 'Derived Column' schema modifier as well as 'Flatten' formatter.

Input Json

[
    {
        "type": "UPDATE",
        "key": { "id": "112710876" },
        "doc": [
            {
                "property": "org.numberOfEmployees",
                "currentValue": [
                    {
                        "value": 2256,
                        "scope": "Consolidated"
                    },
                    {
                        "value": 516,
                        "scope": "Individual"
                    }
                ]
            }
        ]
    },
    {
        "type": "UPDATE",
        "key": { "id": "081243215" },
        "doc": [
            {
                "property": "org.startDate",
                "currentValue": "1979-09-14T06:08:51Z"                
            }
        ]
    },
    {
        "type": "UPDATE",
        "key": { "id": "081243216" },
        "doc": [
            {
                "property": "org.employeeCount",
                "currentValue": "20000"
            }
        ]
    },
    {
        "type": "UPDATE",
        "key": { "id": "081243216" },
        "doc": [
            {
                "property": "org.headOffice",
                "currentValue": {
                    "city": "NY",
                    "country": "US" 
                }
            }
        ]
    }
]

Target Schema

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "object",
    "properties": {
        "id": {
            "type": "integer"
        },
        "startDate": {
            "type": "string"
        },
        "numberOfEmployees": {
            "type": "array",
            "items": [
                {
                    "type": "object",
                    "properties": {
                        "value": {
                            "type": "integer"
                        },
                        "scope": {
                            "type": "string"
                        }
                    }
                }
            ]
        },
        "employeeCount": {
            "type": "integer"
        },
        "headOffice": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string"
                },
                "country": {
                    "type": "string"
                }

            }
        }
    }
}

Is there any way I can stringify currentValue in data flow task, if there is no direct way to transform the input data to target schema?

Any help would be appreciated.

1 Answer 1

1

You can stringify it in a derived column using "toString()" or you can wait for our new Stringify transformation in October :)

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you Mark. However when I tried toString() in derived column, I am getting an error scala.MatchError: AnyType (of class com.microsoft.dataflow.AnyType$). Looks like it supports only primitive types. In this case each record will have different type for currentValue. It can be string, integer, complex type, array etc
Hi Mark, I tried Stringify formatter, however I am getting the same error. Looks like Stringify only works if the incoming data is of same type for all records(either complex type or array of complex type) it process (Stringify expressions must be a complex type or an array of complex types)
Isn't currentValue in your data above an array?
Thanks Mark for your reply. Data type of currentValue can be number, text, complex type, array of string, array of complex type etc. it can be anything. eg. current value of org.numberOfEmployees is array of complex type, currentValue of org.startDate is date, currentValue of org.headOffice is complex type, currentValue of org.employeeCount is number. Is there anyway we can handle such transformation in ADF?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.