Server A has a process that exports n database tables as flat files. Server B contains a utility that loads the flat files into a DW appliance database.
A process runs on server A that exports and compresses about 50-75 tables. Each time a table is exported and a file produced, a .flag file is also generated.
Server B has a bash process that repeatedly checks for each .flag file produced by server A. It does this by connecting to A and checking for the existence of a file. If the flag file exists, Server B will scp the file from Server A, uncompress it, and load it into an analytics database. If the file doesn't yet exist, it will sleep for n seconds and try again. This process is repeated for each table/file that Server B expects to be found on Server A. The process executes serially, processing a single file at a time.
Additionally: The process that runs on Server A cannot 'push' the file to Server B. Because of file-size and geographic concerns, Server A cannot load the flat file into the DW Appliance.
I find this process to be cumbersome and just so happens to be up for a rewrite/revamp. I'm proposing a messaging-based solution. I initially thought this would be a good candidate for RabbitMQ (or the like) where
Server A would write a file, compress it and then produce a message for a queue.
Server B would subscribe to the queue and would process files named in the message body.
I feel that a messaging-based approach would not only save time as it would eliminate the check-wait-repeat cycle for each table, but also permit us to run processes in parallel (as there are no dependencies).
I showed my team a proof-of-concept using RabbitMQ and they were all receptive to using messaging. A number of them quickly identified other opportunities where we would benefit from message-based processing. One such area that we would benefit from implementing messaging would be to populate our DW dimensions in real-time rather then through batch.
It then occurred to me that a MQ-based solution might be overkill given the low volume (50-75 tasks). This might be overkill given our operations team would have to install RabbitMQ (and its dependencies, including Erlang), and it would introduce new administration headaches.
I then realized this could be made more simple with a REST-based solution. Server A could produce a file and then make a HTTP call to a simple (web.py) web service on Server B. Server B could then initiate the transfer-and-load process based on the URL that is called. Given the time that it takes to transfer, uncompress, and load each file, I would likely use Python's multiprocessing to create a subprocess that loads each file.
I'm thinking that the REST-based solution is idea given the fact that it's simpler. In my opinion, using an MQ would be more appropriate for higher-volume tasks but we're only talking (for now) 50-75 operations with potentially more to come.
Would REST-based be a good solution given my requirements and volume? Are there other frameworks or OSS products that already do this? I'm looking to add messaging without creating other administration and development headaches.