0

I want to

  1. Create a new table in the sink side
  2. Remove the white spaces in the column headers
  3. Map the source columns with the sink.
  4. I do not want to copy all the columns in the sink.

(For example: If I have 10 columns in the sink then I only want to copy 8 of them to the sink)

4
  • 1
    What have you tried? I suggest you use Azure Data factory instead of Azure Functions. Commented Aug 22, 2021 at 23:46
  • I tried the stored procedure and the pre-copy script in the copy activity. Can you please tell me, how can I implement this in the Data Factory? Commented Aug 23, 2021 at 4:41
  • I have itemised your requirements. For 1 and 2, where do you want the table definition to come from? Does this need to be created just once, or are you trying to auto create a table for all kinds of different input files? 3 & 4 are pretty easy in Data Factory. Commented Aug 23, 2021 at 4:46
  • I want to auto-create the table in the sink, and when I use the stored procedure the 'auto-create' option is disabled. These all four steps are connected and I was thinking about the staging table. Like: in the stage state of the table, I will remove the spaces of the column headers and then insert 8 columns out of 10 from the staging table to the sink table. I want this for only one table with the stored procedure and staging table. Thanks Commented Aug 23, 2021 at 5:17

1 Answer 1

1

You can accomplish this task using Azure Data Factory Data Flow.

You will have to install self hosted integration runtime in your local machine.

Create a new linked service, use the above IR created, provide your local SQL server and DB name and that should connect it

Follow: https://www.mssqltips.com/sqlservertip/5812/connect-to-onpremises-data-in-azure-data-factory-with-the-selfhosted-integration-runtime--part-1/

Similarly, you need to create Linked Service for Azure SQL DB using default integration runtime available in ADF. Follow: https://www.tallan.com/blog/2021/03/30/how-to-create-a-sql-database-linked-service-in-azure-data-factory/

Once you created both Linked Service, you need to create Datasets using these Linked Service. One dataset will refer to the source table at local machine and other will be used as sink in Azure SQL database.

To create dataset in ADF, follow Author (pencil symbol on left side) -> Dataset -> New Dataset.

Now the main thing comes, ADF Data Flow.

To create data flow: Author (pencil symbol on left side) -> Data Flow -> New Data Flow.

Your data flow will look like shown below. In the Source Setting you need to select input dataset you have created in above steps, where your input data is stored.

enter image description here

You can enable the data flow debug as shown in above image to check the data preview after transformation in each step, as shown below. We will transform column FirstName as per the requirement.

enter image description here

In DerivedColumn, we will replace the space with empty value using replace(FirstName, " ", "") function.

enter image description here

In Alter Row, we will delete the rows based on some condition. Here we are deleting rows where PersonID > 8.

enter image description here

Once done, you can see the resulting data in Sink Data Preview.

enter image description here

Finally, simply create a new pipeline. Drag the data flow activity in white space. Select the data flow activity name we created above in Settings. Click on Add trigger to Trigger Now.

enter image description here

Once pipeline finished, you can check the data in your Azure SQL DB.

Sign up to request clarification or add additional context in comments.

3 Comments

I'm not convinced data flows are necessary, here. If they are, I'm also not convinced You will have to install self hosted integration runtime in your local machine.
Self hosted integration runtime is required when we connect the local SQL server with Azure data factory using Linked Service. The default Azure integration runtime is only capable to connect with other azure services like blob storage, Azure databases, etc. There are other possible ways also but I prefer data flow due to in-build ETL capabilities.
Fair enough, you're 100% correct. (Unless there is a hybrid network setup but that's an assumption)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.