I have a source as an Excel. How to load data from excel to SQL DB using Azure Data Factory?
-
Is modifying the excel file input up-front an option? Really need more context on this because there are a few different ways to do this, but they might not all fit into the exact way you are trying to make this work. Need to know will the excel always come in with 1.Name first, then 2.Class? More details like that.Trent Tamura– Trent Tamura2021-07-12 13:15:31 +00:00Commented Jul 12, 2021 at 13:15
-
Yes it will exactly above inputkonda masthanreddy– konda masthanreddy2021-07-13 06:03:56 +00:00Commented Jul 13, 2021 at 6:03
1 Answer
There are many ways to do this, but I will show you what I believe to be the simplest method.
- Setup your Copy Data Activity in Azure Data Factory to copy the data from excel into an Azure SQLDB staging table.
- Create a Stored Procedure in the Azure SQLDB that will insert from staging table into your final output table
- Connect a Stored Procedure activity to the copy data activity in ADF to reference the procedure you just created in the DB, that way ADF will immediately run the stored procedure after the staging table is loaded.
- You can have command/procedure to clear the staging table after the final output table is loaded, or prior to the copy activity in ADF.
Please refer to the screenshots and comments below for a walkthrough (some things like getting the sheetname aren't currently possible in ADF natively without using an Azure Function or something, but I covered as much as I could. If the excel is named the same as the sheet that could be used instead, that is an option and what I did in this example):
Overall Pipeline View
First Step Truncating the Stage Table to ensure it's empty prior to loading
2nd Step Load the Stage table with file input
Last Step is to run the stored procedure
Here is the code for the stored procedure I made below, plus a screenshot of the ADF stored Procedure activity:
create procedure stag.ttt_test (@ExcelFileName varchar(100))
as
DECLARE @ClassStartRow int
select @ClassStartRow = max(RowId)
from (
select
case when Details = '2.Class' then ROW_NUMBER() over (order by %%physloc%%) else 0 end as RowId
from Stag.ttt_test_stage
) as sub
;
INSERT INTO Stag.ttt_test_final
select
Case
when January is NOT NULL then DATEADD(day, -1, cast('2/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when February is NOT NULL then DATEADD(day, -1, cast('3/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when March is NOT NULL then DATEADD(day, -1, cast('4/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when April is NOT NULL then DATEADD(day, -1, cast('5/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when May is NOT NULL then DATEADD(day, -1, cast('6/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when June is NOT NULL then DATEADD(day, -1, cast('7/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when July is NOT NULL then DATEADD(day, -1, cast('8/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when August is NOT NULL then DATEADD(day, -1, cast('9/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when September is NOT NULL then DATEADD(day, -1, cast('10/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when October is NOT NULL then DATEADD(day, -1, cast('11/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when November is NOT NULL then DATEADD(day, -1, cast('12/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when December is NOT NULL then DATEADD(day, -1, cast('1/1/' + cast(YEAR(getdate()) + 1 as varchar(4)) as date))
end as [Date],
'name' as Category,
Details as [Type],
Coalesce(January, February, March, April, May, June, July, August, September, October, November, December) as [value],
@ExcelFileName as SheetName
from (
select
ROW_NUMBER() over (order by %%physloc%%) as RowId,
st.*
from Stag.ttt_test_stage as st
) as sub
where RowId > 1
and RowId < @ClassStartRow
;
INSERT INTO stag.ttt_test_final
select
Case
when January is NOT NULL then DATEADD(day, -1, cast('2/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when February is NOT NULL then DATEADD(day, -1, cast('3/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when March is NOT NULL then DATEADD(day, -1, cast('4/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when April is NOT NULL then DATEADD(day, -1, cast('5/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when May is NOT NULL then DATEADD(day, -1, cast('6/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when June is NOT NULL then DATEADD(day, -1, cast('7/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when July is NOT NULL then DATEADD(day, -1, cast('8/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when August is NOT NULL then DATEADD(day, -1, cast('9/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when September is NOT NULL then DATEADD(day, -1, cast('10/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when October is NOT NULL then DATEADD(day, -1, cast('11/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when November is NOT NULL then DATEADD(day, -1, cast('12/1/' + cast(YEAR(getdate()) as varchar(4)) as date))
when December is NOT NULL then DATEADD(day, -1, cast('1/1/' + cast(YEAR(getdate()) + 1 as varchar(4)) as date))
end as [Date],
'class' as Category,
Details as [Type],
Coalesce(January, February, March, April, May, June, July, August, September, October, November, December) as [value],
@ExcelFileName as SheetName
from (
select
ROW_NUMBER() over (order by %%physloc%%) as RowId,
st.*
from Stag.ttt_test_stage as st
) as sub
where RowId > @ClassStartRow
Please accept as answer if this works for you and let me know if you have any other comments or issues. Thanks

