I'm an accountant working in the financial controlling department. Currently, all our data for month-end closing and financial statements are in Excel spreadsheets, many of which have hundreds of thousands of rows.
The Problem: This scenario causes two main problems:
The manual consolidation process in Excel is slow and error-prone.
When connecting Power BI directly to these spreadsheets to generate reports, the performance is extremely poor and refreshes take too long.
My Goal: I'm learning SQL and want to use a database to solve this bottleneck. My idea is to create a central database where this Excel data would be imported, and then connect Power BI to that database.
My Technical Questions:
Import (ETL): What is the recommended approach to routinely move data from multiple Excel spreadsheets to a SQL database? Would it be better to use SSIS, Python scripts (with
pandas), or Power BI's own Dataflows?Modeling: Should I simply replicate my spreadsheet structure in SQL (one "huge table" per spreadsheet) or is it essential to normalize the data (e.g., create dimension tables for
Chart of Accounts,Cost Centers, etc., and a fact table for theEntries)?Connection (Power BI): Once the data is in SQL, which Power BI connection mode would provide the best performance for financial controlling reports:
Import(importing the data into PBI) orDirectQuery(querying the database in real-time)?DBMS Choice (Database): Considering my profile (SQL beginner), our environment (corporate, currently Windows/Excel-based), and the goal (Power BI performance), what are the key technical factors when choosing between
PostgreSQL(free) andMySQL? Are there clear Power BI integration advantages of one over the other?