Preservation
Preservation
An important part of RUcore's mission is to preserve information for future generations of researchers and students.

Preservation of information involves several important concepts:

  1. Fidelity to the source information

    Information stored in a repository is frequently derived from source information. The source may be a photograph or book that is digitized for sharing on the web. Information that is "born digital" may be created in a proprietary digital format, such as a dissertation composed with Microsoft Word®, which will be viewable only as long as backward-compatible versions of the software are created by Microsoft.

    RUcore insures fidelity to the source information in the following ways:

    • Creating digital files at high resolution, with lossless compression, to insure fidelity to the source information and to increase the likelihood that a corrupted file can be recovered. Standards have currently been established for still images, including text and graphic resources. These standards are available at the repository the Rutgers University Libraries are developing for the state of New Jersey, New Jersey Digital Highway
    • Establishing a "canonical" version for archiving complex information resources.

      A "canonical" representation for information presents the "truest" version of the source information, in a manner that is easily re-used by researchers. The "truest" representation for information may not be the most complete, but rather the information that experts in the field agree is most representative of the source information. With current computer graphics applications, it is quite easy to add arms to the famous "Venus de Milo" sculpture that are faithful to the artistry of the original sculptor. However, the statue in its current "armless" state is considered by art historians and art lovers to be the "canonical" representation. Fixing "flaws" in information may result in creating a nonstandard version of the source information.

      In digital scholarship, canonical representations of source information are often created to insure that the digital information can be displayed and used even as technology changes.

      For example, data sets are often produced within database management systems, such as Microsoft Access® or MySQL that may not be interpretable in the future. The ability to search across multiple data sets to retrieve information is also useful but difficult to accomplish, since every database is a self-contained information resource.

      Canonical representations for data will be established by the Rutgers University Libraries for storage and access within RUcore in several ways:

      • The libraries plan to develop canonical XML representations of data (longitudinal data, GIS, etc.) that will enable searching across common data elements within multiple data sets.
      • Data sets will be archived with the database management system application. RUcore repository staff will monitor and implement upgrades and changes to the database management system to insure that the data set remains usable.
      • Data sets will be stored in comma separated value (CSV) format, as well as in plain text, to insure that data sets can be migrated and interpreted without the database management system application.

      Canonical representations for text created in commercial off the shelf (COTS) applications will be created in a similar fashion, using the forthcoming electronic theses and dissertation collection as a test bed. Multiple format-independent representations will be created, including:

      • High resolution TIFF images for graphical representations of each page that are very faithful to the source. PDF will be the display format, but since it is a vendor-proprietary application, PDF is not suitable for a digital master file.
      • Rich Text Format (RTF) versions will be created for most documents created with commercial word processing. RTF is an international, vendor-independent format that maintains most formatting features.
      • Ultimately, an XML representation for electronic theses and dissertations will be created. The international ETD community is developing the ETD-ML language, an XML application for electronic theses and dissertations that will create an application-independent format for theses and dissertations that maintains the creator's formatting and use of applications, such as MathML, to represent mathematical and statistical symbols.
  2. Preservation is also dependent on a well-designed storage management system that makes use of online, near line, and offline storage, including climate-controlled off-site storage. RUcore's tiered storage system enables the libraries to maintain, manage and synchronize multiple copies of each master file. This redundancy insures that files are safeguarded against corruption or loss and readily available whenever a researcher needs access.
  3. RUcore's data model and metadata system are also critical for monitoring and maintaining files. RUcore employs a simple, event-based model as the underlying data model for all information sources. This simple data model is context-independent, to meet the needs of current and future researchers and students and to insure that researchers in different fields can locate and use the same data for different purposes. Context can be layered on top of the core data model to meet the needs of the owning researchers and their research community.

The RUcore Data Model

Data Model

Learn more about the data model

The object entity in the data model represents each information resource stored in RUcore. The entire lifecycle of the resource, including the interactions with other entities, is accommodated in RUcore's metadata architecture:

RUcore's metadata architecture

The structure map indicates both logical and physical components of a complex information object. A complex information object either includes concatenated multiple images (such as a digital multi-page book) or consecutive images or sounds (digital video and audio files).

Rights metadata identifies the rights holder(s) for each information resource and identifies the permissions for use that the rights holder has granted, including any restrictions on those permissions.

Rights will intersect with the agent entity to provide a rights holder directory with sufficient information to contact a rights holder for further permissions, such as the right to republish an information resource.

Descriptive metadata provides sufficient information for users to discover and obtain access to information resources.

Source metadata describes the provenance, condition and conservation of analog source materials, such as the photographs, books, maps, etc. that have been digitized for inclusion in RUcore.

Technical metadata provides information about the digital master files that RUcore will maintain for long-term preservation and access.

Digi Prov metadata, or digital provenance metadata, provides a digital "audit trail" of any changes to the digital master file.

RUcore's Workflow Management System (WMS) is a web application for uploading digital information resources and metadata to RUcore. A "core" metadata profile has been designed to support the minimum amount of metadata needed to describe and manage information. Collection owners can use additional data elements, create new data elements, add controlled vocabularies and design input templates to customize metadata beyond the core, to support the needs of their collection and users.

Rutgers University Libraries cataloging faculty can assist you in designing and applying metadata for your digital collection. Contact Mary Beth Weber, Catalog Department Manager, for assistance.

Further Resources:
Presentation on Digital Preservation (PDF)

Version 7.3.1
Rutgers University Libraries - Copyright ©2014