0

I have multiple csv files in the form of a star schema. To perform analytics using Python, is it better to combine all these csv files into one csv file, or to extract data from each csv file and then do analytics? People online have almost always combined all files into one and have then performed analytics. However, combining all csv files would eliminate my star schema. I currently have approximately 25,000 rows and 10 columns in each csv file. The size of each csv file is around 7 MB. Thank you in advance for your help.

3
  • @RoadRunner Should I combine all files into one big file, or read multiple files and then do analytics from the multiple files? Commented Jul 4, 2018 at 4:13
  • 1
    How many csv files are there? Is it a big issue if you remove your star schema? I'm assuming you have 6 csv files, from your previous question. If this is the case, if you combine the files together, the file will be around 42MB, which shouldn't be a problem. Then you only have to read one file. Otherwise, just read the files seperately. Commented Jul 4, 2018 at 4:19
  • @RoadRunner Thank you for your help! I'll combine all files into one and proceed. Commented Jul 4, 2018 at 4:37

1 Answer 1

1

I feel you can leave the fact tables as is and combine the rest of the data with which you can reduce the amount of data your dealing with and have the star schema intact too..

Thanks, Ram

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.