0

I am building a Data flow pipeline which will call control flow pipeline.

The solution I am trying to achieve is to implement the Fuzzy logic for two master table to find relevant matching. As show on below screen capture, I am building the Data flow to connect to the underline tables in the same database, but connection is the same. I want to parameterise the table names hence I can use the same generic connection rather than creating new one for each table.

enter image description here enter image description here

my issues are:

  1. How can I send the table names from control flow to data flow via parameter
  2. How can I brown the table's column names on the join condition as shown on the screen capture

I tried creating parameters on the control flow and it is appeared to be working, however I can't brows the table to pick the join columns from table 1 (source 1 ) table 2 ( source 2 )

Please see below where it is using BLOB storage, enter image description here

1 Answer 1

0

You need to use dataset parameters for this scenario.

Go to your dataset and create a string type parameter like below.

enter image description here

In the dataset, click on edit and use the parameter for the table name like this @dataset().table_name.

enter image description here

Make sure the Schema in the above dataset is empty. Now, give the dataset for the two sources in the dataflow.

For join, the incoming schema should not be empty. That means you should import the schema in the sources.

For that, go to Dataflow debug -> parameters -> give your table names in the parameters.

enter image description here

Now, go to the sources of the dataflow and import the mapping like below.

enter image description here

Similarly, do the same for the second source as well.

Give the columns in the join like below and set the fuzzy logic as per your requirement.

enter image description here

You can see the result in the preview of the join transformation. Give your sink dataset. All this process is for the debug of the join transformation.

To run the dataflow from pipeline, you need to give your table names for the parameters in the dataflow activity in the pipeline like below.

enter image description here

Run the pipeline and dataflow will succeed like mine.

Sign up to request clarification or add additional context in comments.

14 Comments

Thanks for your valuable inputs, I am getting below error: Error code DF-SQLDW-InvalidBlobStagingConfiguration Troubleshooting guide Activity ID a444d4de-4127-4ee0-bae7-551a85aa32db Details at Source 'source2': Blob storage staging properties should be specified, where could I find the BLOB storage properties ?
why are you using Blob storage? You mentioned that your source is from SQL database. Are you using any blob storage as staging?
yes, I am using SQL db as source for source 2 and source 3. Not sure where these errors are generating?
Check if you are using any blob linked service as staging in the source errors of the dataflow. Also check the staging in the dataflow activity settings.
Sorry, the exact step I am trying is Import projection, that is where I am getting this error
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.