1

Fairly new to SSIS and have run into a blockage on my first major project. I'm trying to load data from multiple excel workbooks, each with multiple sheets, into a single table in a DB.

Each Sheet has a block of "Parent Data" at the top of the sheet that applies to every row of base data in that sheet. I would like to add this data into each row of the DB but I can't figure out how to add a set cell for each row.

Keeping in mind that the when ingesting the data each sheet will also have different "Parent Data".

Any help or direction greatly appreciated

6
  • What version of SQL Server and/or SSDT/BIDS/VS are you using? Commented Feb 14, 2018 at 0:12
  • Is the header data in tabular format? or is it in arbitrary cells? It;s difficult to pluck cells out of excel in SSIS. You need to write script to get it out. Or you can try and treat the data as tabular data and try and pull it out of there. Are you able to import the header data out using a data flow? Commented Feb 14, 2018 at 1:34
  • @digital.aaron SQL Server 2012 and SSDT 15.5.6 Commented Feb 14, 2018 at 4:08
  • @Nick.McDermaid arbitrary cells that relate to each row in the same sheet. The header data isn't needed. For context - the arbitrary cells are the TV Show Title and overall show description in various languages and each row with the real data is each episode Commented Feb 14, 2018 at 4:12
  • 1
    See if you can create a data flow that can just get the headings out. If you can do that you have more options for extracting it. If you can't do that the only option is to write script. Commented Feb 14, 2018 at 5:01

1 Answer 1

2

There are no native transformations within an SSIS Data Flow that allow you to inspect/modify the current row based on the previous row(s) data.

In your case, you'd like to carry information from line 1 to lines 2 to N.

As @Nick.McDermaid points out, you have approximately 3 options.

The first is to just load it all into a table "as is" and then modify the data in a post load step (most likely an Execute SQL Task)

The second step is to double read the file/worksheet. The first read will pull the header data out and assign it to SSIS Variables (Script Task is easiest, RecordSet destination + Foreach Enumerator would be the code-free way to do it) and then you can use the Derived Column Transformation to inject your variables as columns. For future readers, this approach is only going to work if the parent block remains constant. If you're parsing complex record types (mainframe, mixed header/detail records, EDI, EMRs etc) this is likely the wrong tool for you

The third step is to dive deep into .NET language of your choice and write an overly complex Script Component Source to assemble the data as needed. I covered that a bit on this post of SSIS Excel Source via Script . The general concept in the ExcelReader class is to use the JET/ACE oledb provider to read an Excel worksheet into a DataTable

Once you have the contents into a datatable, then you write your parsing logic (ExcelParser). Pull whatever data you need from the first N rows and then for each row in your data table, add an output buffer and copy the existing rows over and augment with your header data.

Pretty, it isn't but it's the only way to solve the problem when the data isn't properly tabular.

Sign up to request clarification or add additional context in comments.

1 Comment

Hi Thanks for the very descriptive answer. I'll give it a hot go

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.