Science Tools > The Angus Files

Copyright © 1997 - 2025 Science Tools Corporation All rights reserved	Disclaimer

The Angus Files
by Dr. Angus MacDonald

Early in Science Tools' hisory, some people were having a difficult time understanding what we do, so Dr. MacDonald, research scientist, physicist, and engineer, and definitely not a 'computer person', decided to share his insights into scientific computing with us. The materials provided here were written between 1997 and 2000. In them, he writes with an old-world focus on fundamentals, free of clutter and complexity to relate simple and basic understandings that are also profound and accurate. He provides us his own introduction to these writings... Please beware that his use of language does not consider the overlap with computerese, and "computer people" should well remember older definitions used in the English language.

Introduction | Objects, Databases and Catalogs |
Earth Science Data Processing, Storage and Retrieval

>>back to top<<

Introduction
About these writings

The author is a physicist and engineer who is not a database expert. These writings are in simple terms to facilitate understanding and represents the author's picture after a few months of association with Richard Troy of Science Tools corporation. If they seem too simple, please do not be offended. Perhaps it will help explain things to other associates who make decisions.

>>back to top<<

Objects, Databases and Catalogs

Objects

An object can be anything. The file cabinet does not need to know what is inside it. Certainly it is designed to hold files but the drawers can contain anything, like sandwiches for lunch, for example. It is a mistake to create a file cabinet and be completely dogmatic that it contain only pieces of paper, properly cataloged in hanging folders in alphabetical order. It is just not going to happen, particularly if the owners of the cabinet are a bunch of research scientists.

Databases

A database is basically a computerized file cabinet. It contains objects. The objects can be anything. the only thing which the objects have in common is that they are binary data. That is all the database needs to know. It stores computer files. If the database is properly designed it does not assign any characteristics to the file. The file can contain anything and so long as it can be found upon request, the basic job of the database is done.

A storage system which assigns categories to the objects contains the seeds of endless upkeep work as objects get put in the wrong place or assigned the wrong category, let alone being objects which do not fit properly into an existing category. Since assignation of category is not a necessary part of a database and carries a lot of baggage, it should not be done a priori.

Catalogs

The catalog allows objects to be found in the database. The catalog needs to be in English and the naming of the objects needs to be in the hands of the creator of the object. Internally the database assigns a unique identifier to each object. The researcher storing the object names it. The object is cataloged under the name. If two objects are given the same name, say, "Ocean Surface Temperature Table" and one of the objects is created by the satellite and the other by floating buoys, this is of no consequence as each object in the database knows its' parents and has a unique internal identifier. Objects must know their parents. Objects must know the objects from which they were created, the data and the calculating process. This is a requirement for scientific defensibility. The "Ocean Surface Temperature Table" needs to know its family tree all the way back to the satellite which flew over the ocean taking infra-red photographs.

The Catalog, then, is an index to the objects in the database as named by their author. For the purpose of access, the catalog is not a tree structure but a divided by sets like the Yellow Pages. The same company can be listed under various headings. The headings in the Yellow Pages are assigned by the Telephone Company which spends considerable effort on this. Such assignment is not a requirement. If the author thinks his data should be under a new type, he creates it. The Catalog, thus, allows useful objects to be found by looking for them by name or category but the structure of the database does not require the assignment of name or category by the database system. The system knows about names and categories, yes, but it does not get itself into silly binding situations by trying to define names and categories before they are needed. Any such attempt is bound to fail with the passage of time and will, therefore, require endless maintenance work like the listings in the Yellow Pages. If no attempt is made in advance to define or categorize objects in the database then the categories will grow by themselves without assistance. Researchers put their own names in the index wherever it seems right, just like firms putting listings into the Yellow Pages. Most scientific disciplines have conventions for describing objects and so the basic catalog categories are often provided by database suppliers.

Conclusion

The database which renders accessible all the world data must not impose restrictions upon the form of the data, nor the name of the tables and must allow the data to be found in a catalog which is written in English and in which the categories are allowed to grow and proliferate without restriction from the database structure.

>>back to top<<

Earth Science Data Processing, Storage and Retrieval

In January, 1998, a new database-centric system began production processing Tropical Rainfall Measurement Mission (TRMM) data. This new system offers its users a scientifically defensible, general solution for Earth Science data management and processing. The system is exceptionally flexible and is most suited to data which has geospatial and temporal features and for which processing or processing history is important. It offers many promises, among them automation, easy and sophisticated retrieval mechanisms, a real opportunity to foster inter and intra-discipline cooperation, and the end of 're-inventing the wheel' whenever an Earth Science project is undertaken.

The author will discuss how meta-data regarding processing is structured to constitute a work-flow system, and how this leads to scientific defensibility through known lineage of all data products. Illustration of how scientific processes are encapsulated will illuminate how the system may dispatch them to be executed when desired, how this may be automated, and how previously written processes and functions are integrated into the new system.

Meta-data basics will illustrate how intricate relationships may easily be represented and used to good advantage. Retrieval techniques will be discussed including trade-offs of using meta-data versus embedded data, how the two may be integrated, and how simplifying assumptions may or may not help.

This system is based upon the experience of the Sequoia 2000 and BigSur research projects at the University of California, Berkeley, whose goals were to find an alternative to the Hughes EOS-DIS system. In continuity of the Climate and Global Change theme of this conference, the system is being deployed at UCLA under Roberto Mechoso for use with his Earth System Model.

[end of document]

Feedback

Contact Us

website contact: Webmistress