Menu:



AngeliaScheduler - Remote Queue Manager - Main Form - click to see it full size AngeliaScheduler - Remote Queue Manager - Core Configuration - click to see it full size AngeliaScheduler - Remote Queue Manager - Queue Configuration - click to see it full size AngeliaScheduler - Remote Queue Manager - Adding A New Task - click to see it full size


AngeliaSchedulerTM - Grid Processing



Grid Processing (or Grid Computing) is a distributed processing technique that consists of breaking a large task into a series of smaller tasks and spreading the processing of those "sub-tasks" across multiple participating CPUs.  AngeliaSchedulerâ„¢ can help make implementing a successful processing grid strategy much easier.

Several years ago (see the section titled "2004 - Automated Monthly National Database Update Process" on our Development Projects page) we were faced with a need for a monthly update of a really large database (around 300,000,000 records) which was the product of a fairly complex merge process involving several source databases (totaling about 750,000,000 records).  We also had to get it done using existing resources since it was made clear to us that any capital expenditure requests for the project were not going to be approved.  And we had to be able to complete the process each month in no more than a couple days for the output to be useful.

Our solution was to break up each of the source databases into segments by the SCF code (the first 3 digits of the postal zip code) of each record - a valid approach since one common denominator of the source databases was the presence of a mailing address.  This left us with about 900 individual merges to run that could then be combined into a single final database.  There was some variation in the size of the individual "granules" since population densities vary, but all were of manageable size given the power of a typical high-end workstation or server.

So now we've crossed the first hurdle, and have defined 900 small jobs (varying from a few minutes to an hour or more each) that can be distributed on any reasonable number of CPUs (plus a final assembly job that is run on the database server).   Unfortunately, this brings us to the next hurdle - how do you spread these jobs around on multiple systems.  If you just open 900 command windows spread across however many systems, then all 900 jobs will start at once which will almost certainly saturate some critical resource.  We chose to create a quick and dirty queuing mechanism for each participating system - but the only way to control it was to work with each system in turn - either in person, or using Remote Desktop.

This is where AngeliaScheduler would have been a great help if it had existed at the time.  Between its ability to serialize a stream of jobs, its remote management capability, its ability to repeat tasks indefinitely, and its ability to start a task based on the presence of a particular file, all of the ingredients are in place to configure a group of systems (the "processing grid") to run a series of related jobs that, in combination, perform a task much too large to be accomplished on any one system in a reasonable time period.

We are currently in the process of developing a version of AngeliaScheduler specifically to support grid processing.  This version supports a "shared queue" concept that allows a pool of execution clients housed on participating host systems to process tasks from a single queue.  It also provides a single point of management via the Remote Queue Manager for the systems that belong to the pool.  If you are interested in implementing this type of process, please contact us and mention "grid processing" in your message) so that we can discuss pricing and delivery options.