0

So I know this question seems a bit broad so let me give some context. Where I work our Engineers use these huge Excel files full of data (bill of material numbers, costs, etc.) I am looking for a good way to use these files as a data source. I know it takes longer to import .xlsx vs a .csv file. Would it be beneficial to have some sort of running program that periodically transformers the data into a csv file and then use that file as the data source? Or is there a way to keep excel data in sync with a database?

I am not sure what my "options" are. Or if there is a better common practice for this. I have only been using databases when creating my programs but having to make the engineers upkeep the excel files AND a database in not an option.

What I have looked at already

Using LinqToSQL

Importing Excel Into Server

Any pointers would be appreciated.

4
  • One suggestion it to make sure that this is hidden behind an abstraction. For example, an interface like IProductRepository that contains just the methods you need to interact with. That way your choice won't "leak" itself into other parts of your code. You'll be able to change your mind without modifying everything. Second, when you try to create a different implementation, the interface will clearly indicate what it needs to do. That interface should not in any way hint at what the underlying implementation is. Commented May 1, 2019 at 17:42
  • @ScottHannen Thank you for the reply. Yes my plan was to hide it behind an abstraction, but as for the acutal implementation of my IProductRepository, what are the method available? Is there a "preffered" method for gathering data from Excel? Or should I just run with it? My biggest concerns are that someone will have the xlsx file open when trying to copy the csv data. Commented May 1, 2019 at 17:53
  • I added an answer to provide some more detail. While some methods might be preferred over others (I have reasons why I wouldn't read from Excel), the biggest win is writing as much of the application as possible in such a way that details like this become less critical. Commented May 1, 2019 at 17:56
  • @ScottHannen Yes I absolutely agree with your reasons as well. I believe I will go the route where I ensure the structure of the data from excel does not change, have a periodic task copy the data into new csv files(to keep the data live) and then access those files directly through my abstraction implementation. Thank you again for the suggestion. Commented May 1, 2019 at 18:03

1 Answer 1

1

A few suggestions:

Make sure that your data source is hidden behind an abstraction. For example, an interface like IProductRepository that contains just the methods you need to interact with. That interface should not in any way hint at what the underlying implementation is.

That way your choice won't "leak" itself into other parts of your code. You'll be able to change your mind without modifying everything. Second, when you try to create a different implementation, you'll see up front what methods it needs to support.

Here are a few reasons not to directly use Excel as your data source, but to define your own database instead:

  • If the Excel files weren't meant for that purpose then they might change unpredictably and then your application won't work. (Or worse, it "works" but it's using the wrong columns or sheets and the data looks similar and no one notices for a while.) If you're importing the Excel files into a database and the file layout changes it will still create work, but that work will be at the point where you import the data, not in the rest of your application. You're keeping "volatile" things that could change away from other parts of your code.
  • A plausible, common scenario is that later you also need to include data from another source. If your application is reading Excel, then now you have to figure out how to get data from somewhere else into the Excel workbook. That's doable but not fun at all. It's much easier if you're reading from a database, and Excel is just one way that data gets into it. Then it's much easier to include data from other sources.

Those both reflect the idea that our application should have more control over the things it depends on. When you define a repository interface, your application says that whatever the implementation is, it must conform to this. Creating your own database is an extension of that principle. You control the data store, and whatever puts data in it must conform to it. You can't control what you can't control, but you push it farther away from the parts you do control to limit the work required if it changes.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.