Science Tools Corporation
Copyright © 1997 - 2010 Science Tools Corporation All rights reserved
Disclaimer
About UsOur ValueProducts Products ConsultingReferenceSupport
 

The Distributed Processing System

Introduction

Our Distributed Processing Model utilizes a Science-Tools Database (STDB) as the means by which processing is scheduled, coordinated, and controlled. Entries in a process-queue table serve as notification points to daemons, or "Engines," which are looking for work to be performed. When an engine finds work to be done, it withdraws sufficient information from the database.

Overview | Demand Engine | Eager Engine | DaemonMaster

Rollover A Number For Its Description
(NOTE: If the rollovers do not work, you may need to enable Active-X Controls and/or Java Scripts)

1 3
2 DPS1
DPS2 DPS3 DPS4
DPS5 6 4 5

Also check out our diagram on Tracking Lineage, and you can print (*.tiff) it as well.


Overview

The Distributed Processing System permits Scientific Processes (functions) to be executed on any available system which can be accessed via a network. Archetypal descriptions of processes stored in an STDB installation are read by a daemon which is started on the target system in advance of interest in running processes there. When the time comes, daemons read the archetypal descriptions of processes from the STDB, and dispatch running instances of  these processes into their host systems operating system. The running process then connects to the STDB and reads from it its arguments (parameters).

Processes may be written in any language a customer may wish, and there are a host of details which the customer may consider in making their choices. Because The BigSur System must be capable of 'encapsulating' existing processing methods, DPS offers a "wrapper" strategy which customers may use (and we strongly encourage) in which database interaction is put into a script which orchestrates the running of the process. This is generally broken into three phases: Prologue, Main, and Epilogue. During the Prolog phase, the environment is prepared: Arguments are fetched, disk space is checked, an opportunity to abort if the process is already running or has already run is available, and so on. When the process is ready, Main is executed which can either initiate the running of "external" processing tools (older processing methods, perhaps), or can execute new functions written directly in the script, as desired. When Main has completed, Epilogue observes the new scientific data-products (objects) - if any - created and loads the appropriate meta-data into the database. And it performs any necessary archiving and clean-up work.

DPS provides process templates to help get customers started, illustrate the proper actions to be taken, and provide some level of standardization. Templates are presently only available in Java, and require the Development Pack tool-kit (also in Java). We highly recommend the use of the Java tool-kit, and available templates because the writing of process 'wrappers' (also called scripts) is non-trivial for first-time authors. (Additionally, consulting is available to assist in this effort.)



There are two types of DPS daemons:
DPS-DE
The Demand Engine and DPS-EE The Eager Engine

DPS-DE — The Demand Engine
The Demand Engine only dispatches processes for execution when someone requests the process be run. This approach is referred to in research literature as 'lazy' processing. Lots of criteria are evaluated prior to actual process dispatch, but such checking is only done when someone has explicitly asked for the process to be initiated. This type of processing makes the most sense when the results of a process may not be needed very often, and the costs of running the process are relatively high.

DPS-EE — The Eager Engine
The Eager Engine is very similar to the Demand Engine except that possible process run evaluations are considered whenever new data-products become available. Such processing is very suitable for generating "canned" data-products where a known demand exists for the results and when automation is desired.


>>back to top<<


Demand Engine

The Demand Engine's job is to run jobs (processes, functions, etc) on-demand, when a user asks it to be run. Demand Engines are started on whatever systems processing is desired, and they connect directly, or through a network, to their master database where they look for work.

A DaemonMaster is available which can manage DPS engines - if a DaemonMaster is started on a system, then individual Demand Engines can be started and configured at will, remotely if desired. Multiple DPS engines on a given system are useful for implementing all manner of sophisticated configurations. Each may be configured "on the fly" via digitally signed messages from any authorized user on any authorized system.


>>Back to top<<

Eager Engine

The Eager Engine's job is to anticipate the desire to run a process (job, function, etc) whenever it becomes possible to do so. It does this through a two-stage process involving a processing engine much like a Demand Engine and also by "stealing cycles" from the closing stages of already running processes. It notes, as processing results are saved, what processes are looking for those results and then enqueues those processes (if necessary), and passes the results of the ending process as inputs to newly scheduled processes.

Of course, all of this activity is driven by meta-data configured in advance by staff who define the processing flow.


Daemon Master

The DaemonMaster's job is to provide a means of starting up and controlling DPS Daemons. While a Master may provide utility at startup and managing groups of Daemons, the most important feature DaemonMaster's provide is to provide an ability to remotely start and shutdown DPS Daemons. So long as a DaemonMaster is up and running, any authorized system administrator may startup or shut down DPS Daemons on any node in their installation from any node in their installation.

>>Back to top<<

 
Feedback
Contact Us

website contact: Webmistress

Science Tools > Top Level