|Copyright © 1997 - 2014 Science Tools Corporation All rights reserved|
The extent to which the benefits of science can be fully realized depends critically upon the quality of the connection between researchers themselves and between researchers and members of the public. We believe that it is now possible to improve these connections on a community-wide and even world-wide basis through the use of an appropriate information management system. In this paper we explore the concepts and challenges, and propose an architecture for the implementation of such a system.
While a simple table consisting of research topics, researchers, and contact information, published visibly, could be considered a solution, we posit that such a solution, implemented in this new millenium, misses an overwhelmingly important opportunity and falls dramatically short of the existing potential. We instead propose that what is needed is a robust system to interconnect researchers at a much deeper level, utilizing the information management capabilities of computer systems to good effect. This system's core responsibility is to manage collection and dissemination of the meta-data which provides the enabling infrastructure necessary to permit research systems to be connected using the latest technologies available, and to facilitate collaborative interaction and access by the lay public. For browsing purposes, the system need not distinguish between other researchers and the public at large. Special considerations of the system regarding researchers are focused on the means by which the system is populated with information.
One of the greatest challenges of assembling a most useful means of connecting established researchers is overcoming the cacophony of voices and individualism which inherently exists. With the ubiquity of computing technology in research work today, it would seem natural to use computers as a means to bring harmony, but in scientific computing we find a plethora of solutions to what appear to be common problems. Each researcher has their own favorite data types and data-type hierarchies, data manipulation and visualization tools, system architectures and research paradigms. Given tight budgets, it is hard for individual researchers to see the benefits to them of creating systems which not only do their work but also connect them to others. And, often enough, researchers disagree with the technical arguments of their peers causing endless arguments and time-consuming battles over who's right whenever they endeavor to create computing systems for their disciplines. In sum, researchers are well-educated, intelligent people who have, with few exceptions, chosen to devote their lives to a discipline other than computer science; the information systems they create are tailored to specific problems and are in and of themselves not designed to be adapted to solve the problems of others, much less the more general problem of unifying a whole community, leaving as unthinkable and hardly imaginable a system which can unify scientific research as a whole. Yet increasingly it is becoming clear that this is precisely the need.
To address this challenge and create a computer based information system which binds disparate researchers in a cohesive, unifying system, the system must embrace the fundamental diversity as a feature. As an example, a user of such a system who is interested in some particular type of scientific data may well wish to find:
The answers to these questions, and many others, are stored in meta-data, without which the entire enterprise is not possible. Yet, most systems cannot answer these questions because they were not designed to unify a community and too many details are left as assumptions and presumptions. Discipline-specific, tailored solutions tend to fall short because of lack of consensus and cases where individual researchers may not conform to the ideas of peers in their field; a competent system must handle their data and meta-data with the same aplomb. The key is to have the right meta-data, organized using the right abstractions and using the best data storage technologies presently available.
An existing system: The critical need to provide a Community-Oriented approach was envisaged by the UC Berkeley led Sequoia 2000 project team, under Professor Michael Stonebraker, as an alternative to the Hughes EOS-DIS strategy; Emphasis was given to creating a high-performance earth-science system, fostering collaborative efforts with distributed processing, sophisticated searching, and strong scientific-defensibility features, with an initial focus on a single system to handle the needs of geology, hydrology, oceanography, and climate modeling. A prototype was built by the BigSur project team, and additional diverse data-sets were included to address end-user needs such as those of the State of California's Resources Agency. The technology was first demonstrated in the spring of '95. In the interim the technology was commercialized in 1997, has undergone further development by Science Tools corporation, and is performing production work at various research centers today. We call this The BigSur System (or just BigSur). What we are proposing here is at least one core installation of BigSur devoted to the purposes discussed in this white paper, working in conjunction with other BigSur installations or other scientific information systems which serve other specific needs whenever appropriate.
The most recent Evolution of BigSur pushes the collaborative paradigm even further, adding the concepts of research sites (not just individual computers) and the "publishing" of materials between sites. The BigSur System has a number of important attributes critical to the success of this work, among them:
Progressive-Utilization: Progressive-Utilization, permits system implementers to pick and choose features as desired. Progressive-Utilization extends flexibility by focusing on what the researcher finds appropriate, while minimizing requisite burdens. While The BigSur System was designed to manage Earth-system data from satellite to end-user desktop, Progressive-Utilization permits it to act as simply an electronic notebook, or just a distributed processing system, if that's all that is desired. In fact, the system does not even have to be installed at any given research facility in order to track and provide access to work performed there.
From an implementation perspective, more information is available for sharing and community-unifying purposes when scientific work includes elements which populate the information system automatically as it is conducted; There are many ways in which the system may be populated with data and meta-data starting with the automation of processing from within the system itself, and including technologies such as XML parsers of web-sites, email notification systems, FTP repositories, and so forth. Indeed, BigSur has been populated in precisely these ways in existing implementations. And a very significant contribution may be made "by hand", with a human being entering information in to the system on an as-needed basis. Some simple tools for this purpose may be made available so researchers may describe their work themselves, eliminating a human bottle-neck. As the installation grows, the distributed features may be used to divide workload and reduce resource loading such as network bandwidth. Specific disciplines may find the system of more interest than others and they may wish to have their own installation into which the community puts more than just superficial descriptions of their work - they may store actual utility programs, sample data-sets, and other introductory elements on the site. Some may decide that they like the system and wish to use it to help enable their own research, for example by using the Distributed Processing System as a means to automate workflow processing.
One of the greatest visions for science is a computational unification in which every researcher can interact with all other researchers through use of their own research system. The system we propose has all the right elements to do that: The capacity to describe the work of every researcher in their own terms, the ability to manage every type of data object and the functional processes that operate against them, and the ability to automate this processing. With some type conversions to perform transformations of data from one form to another - automated by the system - the system can join together the work of researchers from disparate disciplines. The system has full information about the associative relationships between scientific elements and so it "knows" the paradigms of each researcher. It never forgets, so as people change, knowledge of how things are done is not lost. These features not only enable research collaboration on a scale never previously envisaged, they also enable sharing and dissemination of scientific knowledge to the public at large with a sophistication unparalleled in history.
Contacts: Richard Troy: RTroy@ScienceTools.com, Olga Kingrey: Olga@ScienceTools.com
website contact: Webmistress