|Copyright © 1997 - 2023 Science Tools Corporation All rights reserved
The Distributed Processing System
The BigSur System's Distributed Processing System is a form of compute engine known variously as either "grid" or "cloud" computing because it fits the formal descriptions of these computing terms. In fact, our DPS was the world's first grid computing engine and was developed in 1995, before the term "grid computing" was coined.
However, DPS is more than just a grid or cloud computing platform because it posesses a lot of semantic information which is normally not available to grid or cloud computing systems and because of this, it has capabilities beyond the ordinary. Here we explore some of its capabilities.
When we look at DPS in this discussion, we're looking at it primarily from an administrator's point of view.
Our Distributed Processing System utilizes an instance of a "Server Installation" of The BigSur System (with its STDB meta-data repository) as the means by which processing is scheduled, coordinated, and controlled. Entries in the Server Installation's process-queue serve as notification points to daemons, or "Engines," which are looking for work to be performed. When an engine finds work to be done, it withdraws sufficient information from the meta-data and "dispatches" the work.
Note that the node that performs the computational process work is usually not the same system that runs the Server Installation, but it can be. The Server installation provides information to the DPS client system's daemon(s) about what work is needed, possibly including compile-on-the-fly information, source code, or whatever else is required to actually execute the desired program(s). It is the job of the daemon to ensure that the required program(s) is(are) either available, fetched, or compiled "on the fly", and then to "dispatch" the appropriate "process."
The Distributed Processing System permits Processes (functions) to be executed on any available system which can be accessed via a network. Archetypal descriptions of processes stored in a BigSur installation's server are accessed by a daemon which is started on the target system in advance of interest in running processes there. When the time comes, daemons fetch the archetypal descriptions of processes from its Server Installation's STDB, and then perform all the steps necessary to dispatch (launch) running instances of these processes into their host system's operating system. The running process itself then connects to the BigSur Server and reads from it its arguments (parameters).
While a "Process" can be most anything that can be run on a computer, and while there are a number of guidelines, most all of which can successfully be ignored or done in alternative ways, as a practical matter, very nearly all BigSur users take advantage of the ability of BigSur to "encapsulate" their existing Processes using Java-based "Process Wrappers". There are many reasons to do so, but most compelling is because doing so is so darned convenient and flexible, and gives the user a lot of power to do what they want quickly. These "wrappers" are used as convenient places to take advantage of all the power and sophistication the BigSur Java API provides, primarily in the starting phase - the "Prologue" - of a Process, and after the main work is done - the "Epilogue" - as newly created data products are recognized and evaluated. Process Wrappers - whether from our templates or written from scratch - give access to BigSur's cache management, file transport mechanisms, and a large list of other features and ensure that BigSur is used properly and provides all the features it's capable of. The middle part - "Main" - typically launches other programs, sometimes using BigSur's API to gain management features over multi-threading multipe program launches for computing parallelism in a single process.The API contains at time of this writing over 1643 public methods, so the user doesn't have to reinvent the wheel but can focus their time on getting on with their own research interests.
As mentioned above, Processes are generally divided into three phases: Prologue, Main, and Epilogue. During the Prolog phase, the environment is prepared: Arguments are fetched, disk space is checked, an opportunity to abort if the process is already running or has already run is available, and so on. When the process is ready - that is, when the prologue has run successfully, Main is executed which can either initiate the running of "external" processing tools (such as scientific algorithms implemented as an executable program), or can execute new functions written directly in the "wrapper", as desired. When Main has completed, Epilogue observes the new data-products (objects) that were created - if any - and loads the appropriate meta-data into the metadata repository. It then performs any necessary archiving and clean-up work, and performs any appropriate notifications of success or failure.
Process Templates help users get started, illustrate the proper actions to be taken, and provide some level of standardization. Templates are presently only available in Java, and require the Development Pack tool-kit (also in Java). Of course, consulting is available to assist in the effort of writing Processes.
Let us suppose that we have two research organizations that regularly collaborate. Site 1 collects up some kind of data and processes it into some form that's interesting to researchers on Site 2. Unlike Site 1, however, Site 2 is small and doesn't have a lot of compute resources itself. It's lacking in network bandwidth and CPU power. So, Site 2 has a relationship with a third organization, Site 3, which is a super-computing center that does heavy crunching for its clients. Site 3, however, has a strict rule - no in-bound connections are permitted for security reasons. In this scenario, our example illustrates how the researchers collaborate seamlessly.
It is important to note that while we here articulate a lot of details about how the collaboration works, this is all done big BigSur and there really isn't all that much work for the collaborators to do to work with one another as The BigSur System handles most everything. Beyond whatever research collaboration they have in mind, they only need to agree on some basics with their counterparts regarding a little security information and then install The BigSur System on the involved computers. There's a little bit of configuration required to tell BigSur how to authenticate itself - using standard, easy to use tools - and then they can set about their own research without worrying about the "mechanics" of their collaboration.
This example proceeds along steps 1 through 6 in order. To follow along using the visualization below, move your mouse over the appropriate black dot with a numeral indicating the step number that corresponds to the numbered bullets that follow.
1) Site 1, wich happens to use an Informix RDBMS as the meta-data repository, runs F(X) ("F at X") - that is, it executes a process (or function) named Fwith argument X- to produce data object W.
2) The Publisher on Site 1 recognizes that it is supposed to "publish" the existence of object W, but is only instructed to inform Site 2 that it exists - the data itself that "is W" is not sent because the Publisher isn't configured to do so.
3) Someone on Site 2, which happens to be using an Oracle RDBMS as its Server's meta-data repository, notices the existence of W and decides to run process Y on W - they want to run Y(W) - to generate Z. So, they make the request.
4) Because Site 2 doesn't have sufficient compute resources to perform Y at W, they have an arrangement to use the supercomputers at Site 3. And so a BigSur DPS Daemon notices the work request (the entry in the work queue), and dispatches Y(W).
5) Process W (or the wrapper for process W) recognizes that the data for object W actually resides on Site 1, so it fetches that data using the meta-data describing how to do so from the BigSur Server at Site 2.
6) As process W is in the final stages of completion, it recognizes that the data that is Object Z needs to be returned to Site 2, so it transports the data back to its home repository there. At this point, the process is done and it exits and in the act of exiting, if the user who requested W had asked for notification, they are notified.
A Number For Its Description
are two types of DPS daemons:
The Demand Engine
The Eager Engine
The Demand Engine's job is to run jobs (processes, functions, etc) on-demand, when a user asks it to be run. Demand Engines are started on systems where processing is desired to be executed in advance of the need. Demand Engines connect directly, or through a network, to an instance fo a BigSur Server installation where they look for work. A daemon can be configured to connect to any BigSur Server for which it has license and privileges. In this way, one system can provide processing services for many BigSur Server installations.
A DaemonMaster is available which can manage DPS engines - if a DaemonMaster is started on a system, then individual Demand Engines can be started and configured at will, usually remotely via ScienceMaster. Multiple DPS engines on a given system are useful for implementing all manner of sophisticated configurations. Each may be configured "on the fly" via digitally signed messages from any authorized user on any authorized system.
The Eager Engine's job is to implement the desire to run a process (job, function, etc) on particular data whenever it becomes possible to do so. It does this through a two-stage process involving a processing engine much like a Demand Engine and also by "stealing cycles" from the closing stages of already running processes. That is, as a process is going through its epilogue, it would typically notify its BigSur Server of new objects created. As it does so, BigSur will take note of any processes that may be waiting for the indicated data object to appear and it will then, in effect, notify them. Put another way, as a process completes, it notes as processing results are saved what processes are looking for those results and then enqueues those processes (if necessary), and passes the results of the ending process as inputs to newly scheduled processes.
Of course, all of this activity is driven by meta-data configured in advance by staff who define the processing flow.
The DaemonMaster's job is to provide a means of starting up and controlling DPS Daemons. While a Master may provide utility at startup and managing groups of Daemons, the most important feature DaemonMaster's provide is to provide an ability to remotely start and shutdown DPS Daemons. So long as a DaemonMaster is up and running, any authorized system administrator may startup or shut down DPS Daemons on any node in their installation from any Client node in their installation.
website contact: Webmistress