How to parametrise the Data flow source 1 and source 2 in the same SQL DB but two different tables

Question

I am building a Data flow pipeline which will call control flow pipeline.

The solution I am trying to achieve is to implement the Fuzzy logic for two master table to find relevant matching. As show on below screen capture, I am building the Data flow to connect to the underline tables in the same database, but connection is the same. I want to parameterise the table names hence I can use the same generic connection rather than creating new one for each table.

enter image description here enter image description here

my issues are:

How can I send the table names from control flow to data flow via parameter
How can I brown the table's column names on the join condition as shown on the screen capture

I tried creating parameters on the control flow and it is appeared to be working, however I can't brows the table to pick the join columns from table 1 (source 1 ) table 2 ( source 2 )

Please see below where it is using BLOB storage, enter image description here

Rakesh Govindula · Accepted Answer · 2024-03-19 03:44:13Z

0

You need to use dataset parameters for this scenario.

Go to your dataset and create a string type parameter like below.

enter image description here

In the dataset, click on edit and use the parameter for the table name like this @dataset().table_name.

enter image description here

Make sure the Schema in the above dataset is empty. Now, give the dataset for the two sources in the dataflow.

For join, the incoming schema should not be empty. That means you should import the schema in the sources.

For that, go to Dataflow debug -> parameters -> give your table names in the parameters.

enter image description here

Now, go to the sources of the dataflow and import the mapping like below.

enter image description here

Similarly, do the same for the second source as well.

Give the columns in the join like below and set the fuzzy logic as per your requirement.

enter image description here

You can see the result in the preview of the join transformation. Give your sink dataset. All this process is for the debug of the join transformation.

To run the dataflow from pipeline, you need to give your table names for the parameters in the dataflow activity in the pipeline like below.

enter image description here

Run the pipeline and dataflow will succeed like mine.

answered Mar 19, 2024 at 3:44

Rakesh Govindula

11.9k2 gold badges5 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

gamageg manjula Over a year ago

Thanks for your valuable inputs, I am getting below error: Error code DF-SQLDW-InvalidBlobStagingConfiguration Troubleshooting guide Activity ID a444d4de-4127-4ee0-bae7-551a85aa32db Details at Source 'source2': Blob storage staging properties should be specified, where could I find the BLOB storage properties ?

Rakesh Govindula Over a year ago

why are you using Blob storage? You mentioned that your source is from SQL database. Are you using any blob storage as staging?

gamageg manjula Over a year ago

yes, I am using SQL db as source for source 2 and source 3. Not sure where these errors are generating?

Rakesh Govindula Over a year ago

Check if you are using any blob linked service as staging in the source errors of the dataflow. Also check the staging in the dataflow activity settings.

gamageg manjula Over a year ago

Sorry, the exact step I am trying is Import projection, that is where I am getting this error

|

Collectives™ on Stack Overflow

How to parametrise the Data flow source 1 and source 2 in the same SQL DB but two different tables

1 Answer 1

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related