HISTORICAL DOCUMENT
First Published in June, 1998, delivered at Goddard SFC, Greenbelt,
MD
New
Methods in Science Computing
Primary
Benefits | Features | Objects
| Processes | Relationships |
Scientific Defensibility
Distributed Objects | Distributed Workflow
System |
Process Dispatching |
Process Scripts
Multi-Dimension Arrays | Multi-Variate
Point Operations
We
provide a set of tools, loosely known as "BigSur" for historical
reasons. Among these tools are STDB - The Science-Tools Database,
DPS-DE, and DPS-EE- The Demand Processing System, Demand Engine and Eager
Engine, respectively. The following outlines the overall architecture
and benefits of this system.
New
Methods in Science Computing
Our system
represents a new perspective for performing Science. For example, from
reception of satellite sensor data to presenting carefully analyzed information
upon a persons desktop, performing modern earth science is an exercise
in managing data. The BigSur System addresses the
end to end challenge literally from data collection, through processing,
to delivery of results.
Our system
learns to handle each researchers scientific abstractions and harness
their existing tool-sets, and imposes as small a burden upon the researcher
as possible. An opportunity to automate the mechanisms of scientific processing
relieves the researcher of many tasks and permits more time to focus on
research instead of data-management. Modern database technology is used
as a means of tying the pieces together and provides for a straight-forward
approach to interoperability between research groups since the system
is available to be browsed by any application which wishes access. The
meta-data presented in the database conforms to the FGDC
Standard and borrows on the Canadian SAIF
Standard, and provides just the right perspective on managing data
which have geospatial and temporal attributes. The meta-data provides
a rich collection of attributes that can be associated with data. This
permits all users to find what they are interested in -- from pointers
to white-papers, to actual code that manipulates a researchers objects,
it's in there! Geospatial searches are greatly enhanced through use of
special R-Tree indexing, making browsing of large collections practical.
And researchers are free to use as much or as little of the system as
they wish - the system does not impose great obligations to be useful.
The BigSur
Perspective...
Earth
Science is all about managing data…
…we
provide the natural solution;
A database-centric
model for
Science
data management
and processing.
Object-management,
work-flow management, and distributed processes and objects are built
into the core design.
Primary
Benefits
- Scientific
Notebook - A repository for data about scientific data
- Workflow
System - Parent-child associations of scientific functions are managed
- Automation
of Processing - The system can learn how to run your processes
- Distributed
Objects - Fully distributed meta-data, universal naming and special
"snipping" functions
- Distributed
Processing - Any system can host your scientific processes (and objects)
- Resource
Discovery - Robust metadata and special indexes assist your applications
- Multiple-Dimension
Objects - From cropping to Multi-Variate Point Operations...
These
features promote Scientific Defensibility!
Features
Complete
end-to-end solution:
- Object
management including
- Multi-Dimensional
Array Support
- Distributed
Objects
- Process
management
- Workflow
system
- Distributed
Objects and Processes
- Utilization
of existing tools
- Processing
tools
- Visualization
tools
- General
purpose browser
Objects
Database
meta-data includes:
- Temporal
& Spatial domains
- Parent
Object references
- Parent
Processes
- Process
definition
- Process
instance
Objects
may be real or conceptual!
>>back
to top<<
Processes
Processes
definitions kept in the database...
- Process
source may be stored
- Process
arguments are known
Process
definitions may be merely notebook entries, or may be capable of being
dispatched by the Distributed Processing System.
>>back
to top<<
Relationships
One
of BigSur's Strongest features is its powerful and
very flexible management of relationships, including:
- Objects
to other objects
- Objects
to Processes
- parent
process definition
- parent
process instance
- Processes
to other Processes
>>back
to top<<
Scientific
Defensibility
BigSur
tracks how objects are created
Process
definition, and source
Arguments
used
Parent
data-sets
Therefore,
the complete lineage is known.
If questions
arise, or problems discovered, not only can the details be
traced,
but processing can be repeated with corrected parents.
Distributed
Objects
Objects
may be distributed throughout a network
BigSur
permits you to manage meta-data on any site desired
Use
naming convention of choice:
URLs
- Uniform
Resource Locator
Kahn-Wilenski
Handle
DLOBH
- Distributed Large Object Handle
Distributed
Workflow System
The workflow
system provides for “eager, lazy, and push”:
Push
- inbound data is ingested from the outside
Lazy
- Processing only done on request
Eager
- Processing done when Parent data objects are ready
Process
Dispatching
Processes
are dispatched from a queue when they are ready by a dispatching
daemon.
Multiple
daemons may exist
on
any system in the network
may
be used to control load and work locations
Daemons
may compile and run source code, if desired, or merely execute scripts.
Process
Scripts
Scripts
can be very helpful to a Process.
They
may:
Fetch
arguments from the database
Prepare
the environment
Clean
up the environment
Create
new database entries for objects created by the process
Move
objects to archives for safety
>>back
to top<<
Multi-Dimension
Arrays
The motivation
for creating MDAs was two-fold:
Provide
capability for performing “The Query From Hell” where disparate
data-types are joined.
Provide
good performance for distributed large object management by providing
a “remote snip” capability.
Multi-Variate
Point Operations
MVPOs
are used to join MDAs
Provide
built-in manipulation of multi-dimension objects
Joins
based upon “shape” of arrays
Provide
ability to define “cell” as a known data-type
This
permits use of advanced object-relational features where existing
functions (methods) may be easily applied.
[end of document]
|