SAM

Scientist all over routinely generate large volumes of data from both computational and laboratory experiments. Such data, which are irreproducible and expensive to regenerate, must be safely archived for future reference and research. The archived data form and the point at which users archive it are matters of individual preference. Usually scientists store data using multiple platforms. Further, not only do scientists expect their data to stay in the archive despite personnel changes, they expect those responsible for the archive to deal with the storage technology changes without those changes affecting either the scientist or their work.

Essentially, we require a data-intensive computing environment that works seamlessly across scientific disciplines. Ideally that environment should provide all of the file system features. Research indicates that supporting this type of massive data management requires some form of Meta -data to catalog and organize the data.

Problems Identified

National Sciences Digital Library has implemented metadata previously and has find it necessary to restrict metadata to a specific format. The Scientific Archive Management System, a metadata based archive for scientific data has provided flexible archival storage for very large databases. SAM uses metadata to organize and manage the data without imposing predefined metadata formats on scientist. SAM's ability to handle different data and metadata types provides a key difference between it and many other archives.

Restrictions imposed by SAM:

It can readily accommodate any type of data file regardless of format, content or domain. The system makes no assumptions about data format, the platform on which the user generated the file, the file's content, or even the metadata's content. SAM requires only that the user have data files to store and will allow the storage of some metadata about each data file.
Working at the metadata level also avoids unnecessary data retrieval from the archive, which can be time- consuming depending on the files size, network connectivity or archive storage medium. SAM software hides system complexity while making it easy to add functionality and augment storage capacity as demand increases.

About SAM

SAM came into existence in 1995 by EMSL - Environmental Molecular Science laboratory. In 2002, EMSL migrated the original two-server hierarchical storage management system to an incrementally extensible collection of linux - based disk firms. The metadata- centric architecture and the original decision to present the archive to users as a single large file system made the hardware migration a relation file system made the hardware migration a relatively painless process.