First Published in June, 1998, delivered at Goddard SFC, Greenbelt,
Methods in Science Computing
Benefits | Features | Objects
| Processes | Relationships |
Distributed Objects | Distributed Workflow
Process Dispatching |
Multi-Dimension Arrays | Multi-Variate
provide a set of tools, loosely known as "BigSur" for historical
reasons. Among these tools are STDB - The Science-Tools Database,
DPS-DE, and DPS-EE- The Demand Processing System, Demand Engine and Eager
Engine, respectively. The following outlines the overall architecture
and benefits of this system.
Methods in Science Computing
represents a new perspective for performing Science. For example, from
reception of satellite sensor data to presenting carefully analyzed information
upon a persons desktop, performing modern earth science is an exercise
in managing data. The BigSur System addresses the
end to end challenge literally from data collection, through processing,
to delivery of results.
learns to handle each researchers scientific abstractions and harness
their existing tool-sets, and imposes as small a burden upon the researcher
as possible. An opportunity to automate the mechanisms of scientific processing
relieves the researcher of many tasks and permits more time to focus on
research instead of data-management. Modern database technology is used
as a means of tying the pieces together and provides for a straight-forward
approach to interoperability between research groups since the system
is available to be browsed by any application which wishes access. The
meta-data presented in the database conforms to the FGDC
Standard and borrows on the Canadian SAIF
Standard, and provides just the right perspective on managing data
which have geospatial and temporal attributes. The meta-data provides
a rich collection of attributes that can be associated with data. This
permits all users to find what they are interested in -- from pointers
to white-papers, to actual code that manipulates a researchers objects,
it's in there! Geospatial searches are greatly enhanced through use of
special R-Tree indexing, making browsing of large collections practical.
And researchers are free to use as much or as little of the system as
they wish - the system does not impose great obligations to be useful.
Science is all about managing data…
provide the natural solution;
work-flow management, and distributed processes and objects are built
into the core design.
Notebook - A repository for data about scientific data
System - Parent-child associations of scientific functions are managed
of Processing - The system can learn how to run your processes
Objects - Fully distributed meta-data, universal naming and special
Processing - Any system can host your scientific processes (and objects)
Discovery - Robust metadata and special indexes assist your applications
Objects - From cropping to Multi-Variate Point Operations...
features promote Scientific Defensibility!
Objects and Processes
of existing tools
may be real or conceptual!
& Spatial domains
definitions kept in the database...
definitions may be merely notebook entries, or may be capable of being
dispatched by the Distributed Processing System.
source may be stored
arguments are known
of BigSur's Strongest features is its powerful and
very flexible management of relationships, including:
to other objects
to other Processes
tracks how objects are created
definition, and source
the complete lineage is known.
arise, or problems discovered, not only can the details be
but processing can be repeated with corrected parents.
may be distributed throughout a network
permits you to manage meta-data on any site desired
naming convention of choice:
- Distributed Large Object Handle
system provides for “eager, lazy, and push”:
- inbound data is ingested from the outside
- Processing only done on request
- Processing done when Parent data objects are ready
are dispatched from a queue when they are ready by a dispatching
daemons may exist
any system in the network
be used to control load and work locations
may compile and run source code, if desired, or merely execute scripts.
can be very helpful to a Process.
arguments from the database
up the environment
new database entries for objects created by the process
objects to archives for safety
for creating MDAs was two-fold:
capability for performing “The Query From Hell” where disparate
data-types are joined.
good performance for distributed large object management by providing
a “remote snip” capability.
are used to join MDAs
built-in manipulation of multi-dimension objects
based upon “shape” of arrays
ability to define “cell” as a known data-type
permits use of advanced object-relational features where existing
functions (methods) may be easily applied.
[end of document]