I'm the data guy at a small startup in Austin. All of the analysis that I do (thus far) is stored as a set of ad hoc scripts that I run from my laptop. This is a bad idea.
I'm going to sketch my plans here to deploy my analysis, and I'd like to know if there are any glaring holes that I've missed, or anything else that I should be considering. I think the outline I have keeps things atomic enough so that I can plug things in when and where I need to, but also allows me to run one script pretty easily. A secondary (longer-term) goal would be to put a simple web front-end up that allows a user (i.e., employee at my company) to invoke scripts one at a time, see here: Very simple web service: take inputs, email results .
I want to deploy the scripts to a server, and I'm thinking of organizing my scripts into a set of python modules. Then, I want my batch script to look like:
import analysis
batch_dict = analysis.build_batch_dict()
assert sorted(batch_dict.keys()) = ['Hourly', 'Monthly', 'Nightly', 'Weekly']
scripts_to_run = analysis.what_batch() # get from command line?
results_directory = analysis.make_results_directory()
failures = {}
for script in scripts_to_run:
try:
script.analyze()
script.export_results(results_directory)
except Exception as e:
failures.update(script.failed(e))
analysis.completed(failures)
I'll rewrite my analyses so that they are handled by a class.
class AnalysisHandler(object):
...
def analyze():
pass
def export_results(some_directory):
pass
def failed(exception):
pass
def run_with_non_default_args(*args, **kwargs):
pass
def something_else_im_missing_now():
pass
All of the scripts will be handled by something that inherits from AnalysisHandler.
The directory structure on the server will look like:
datalytics/
results/
02-14-2013/
script1/
/log
/error
/data
script2/
/log
/error
/data
.
.
.
<etc>
scripts/
script1/
bin/
data/
doc/
script_1/
tests/
setup.py
.
.
.
analysis/
__init__.py
analysis.py
batch.py (see above)
new_script.py
run_all_tests.py
run_some_tests.py
run_this_script.py
run_everything.py