Metadata is all the information needed to manage, describe, preserve and make information available to users.
A metadata schema is generally defined as an XML (eXtensible Markup Language) schema. XML is a markup language that provides semantic structure for expressing information on the web. XML provides meaningful semantic tags for information. Each XML tag is enclosed in angle brackets <> and must be opened <> and closed </> For example, in XML, a sentence can be meaningfully structured in XML in several ways:
<subject>The cat in</subject>
Metadata is generally stored in a database management system, such as MySQL, Oracle®, or Microsoft Access®, which can be displayed and exported in XML syntax.The standard components of metadata are:
As an example, let's examine the data element "creator."
The data element would be creator. A data element may have subelements. In this case, data element might be further divided into the subelements, name and role.
The value for each subelement would be a unique instance of information enclosed within each data element.
For example, the creator of this article could be expressed in XML in the following way:
As you can see in this example, the data element
<creator> has two subelements,
A constraint on the expression of the data element
<name> is that the
<name> value must be expressed as Last name, First name.
<role> subelement might be used as a dynamic label for creator, so that a metadata record for this document would be expressed in XML in this manner:
The record display would include the following labels:
A data model provides the basic relationship between the information you are describing and managing and the information community that represents the predominant users of the data.
Metadata should be designed to support information needs of the user community-today and far into the future. A data model should underlie any decisions about metadata design and creation. Your data model will reflect the nature of the information that you manage, and its use by your information community.The data model provides for the basic description of information so that it can be effectively discovered and used. A good data model should be:
Begin by answering some basic questions:
Next, expand the exploration to evaluate the information strategies organizations similar in mission and purpose are using. In your discussions with users, you asked what other sources of information they predominantly use. What is the metadata strategy employed for those other sources of information? What organizations organize and make available these ancillary sources of information? It is important to co-locate your metadata within the larger information landscape for two reasons: to integrate your metadata collaboratively with other organizations for a "one stop shop" for information and to participate in a larger community so that the expense and expertise required to develop and maintain a metadata schema do not fall exclusively to your organization. An example of a very effective community collaboration is the PBCore metadata standard developed by members of the public broadcasting community, under the sponsorship of the Corporation for Public Broadcasting. Now that you have identified the information and users specific to your organization, you can create a basic model that demonstrates the nature of the information you are managing and the user needs that it serves. This data model can be a simple diagram showing the relationship between different entities, including creators, managers and users of data. This completes a critical first step in developing a metadata strategy. Understanding your information, your users and how they interrelate is a critical but often neglected first step. The next step involves selecting among the many metadata standards available to meet the requirements of your data model.
The life cycle of information is the interaction of the information object with place (place of creation, place of storage, etc.) and agent (the person(s) or organization(s) responsible for creating, describing, managing or using the information). When the object interacts with place or agent at a specific point of time, an event in the lifecycle of the information is said to occur. Events provide context for information use.
Consider an artist who creates an impressionistic painting in her twenties, which is purchased and displayed by a museum in Montclair. Two events in the lifecycle of the work would be the creation of the painting and its permanent exhibition in the Montclair museum. Both events need would be documented and discoverable, by different users with different needs.
A researcher looking for works that demonstrate the influence of impressionism on the artist would discover this work. The researcher might be happy with the digital representation, which provides sufficient detail to demonstrate the influence of Monet upon the artist.
A high school art teacher planning a field trip to view the artist's work in situ would also discover this work, and sufficient information about its permanent exhibition to plan the field trip for his students.
Each user finds the information he or she needs by discovering a different "event" in the lifecycle of the data. RUcore, can meet these core information needs without overloading the object with context specific to one type of scholar or one field of study.Once you have developed the data model-the underlying principle behind the organization of your data, you will be ready to select a baseline metadata standard that conforms to your data model.
The proceeding sections covered the pre-requisite knowledge for selecting a metadata standard: understanding of the data that you need to describe and manage, understanding the information use requirements of your target audience and an ability to read and understand metadata documentation. Now you are ready to actually evaluate metadata standards against the emerging data model for your organization. This section will describe important criteria for evaluating and selecting a metadata standard:
It is important to understand the information needs and practices of your target audience. What legacy metadata systems or standards do they currently use? At what point in their average workflow are they seeking the kind of information that you will be describing? What data elements will work with their workflow context and frame of reference? How can you meet their needs to find, identify, select and obtain in an efficient and understandable manner? While you want to meet the urgent and immediate needs of your target users, you also want to be aware that these needs will change over time and also that digital information inevitably attracts a wider audience than initially envisioned. You want to be sure that your information model is "context-independent"-that it can accommodate the needs of your current audience but also support additional users with different needs. You want to incorporate the current needs of the user but be extensible and flexible beyond those current needs. My recommended approach is to be context or use independent in the underlying metadata and the information model and then to add user context in the creation, search, retrieval and display tools. Unfortunately, many metadata standards are highly context-dependent. It will be interesting to see if they can survive the rapid evolution of user needs and interests.
It is important to evaluate metadata standards in the context of their development and use. What community developed the standard and what needs did they intend to address through the standard? MPEG-7, the "Multimedia Content Description Interface" emerged predominantly from the commercial media market. Its data model, structure and design focus on the delivery of digital media assets to end users. While it can be used for other purposes, such as the archiving and management of digital assets, there are limitations in the design that must be worked around to support different uses.
A good way to identify potential metadata standards to employ is to look at the metadata standards used by complementary organizations, by communities that produce information that is also heavily used by your target audience, and by organizations that you might want to collaborate with. Did they develop a metadata standard or an application profile for an existing standard? What metadata standard(s) did they choose, and why? Also, a flexible, context-independent metadata standard may offer the best opportunities for meeting current and future information needs. This is the metadata strategy employed within RUcore.
The metadata standard selected needs to integrate smoothly with the organization's existing information environment. If you are already describing information using data elements or fields in a database, will they map readily to the metadata standard or schema you have selected?
The existing information environment also consists of the staffing and technology available to implement the metadata. If your organization plans to utilize the services of student labor or volunteers, it will be important to consider a simpler, more streamlined metadata schema.
It is also critical to consider the controlled vocabularies available to populate the data elements within a metadata standard. Many standards specify the use of controlled vocabularies within data elements, such as the requirement to use Internet MIME types within a "format" data element. The use of controlled vocabularies, which provide authoritative terms to choose among, can increase ease of cataloging among the metadata creators as well as increase searching precision for end users.
Metadata standards are generally designed to either respond to the needs of a specific community of users or to serve a broad range of communities and information needs. In either case, the metadata standard as written is unlikely to meet all of your requirements. Over time, your requirements will change as your users' information needs evolve, and as the information you are collecting and managing changes. It is therefore critical to evaluate a metadata standard for its ability to accommodate customization and evolution. Does the metadata standard provide flexibility in adding controlled vocabularies that represent the subject domain and information context of your users? Does the metadata standard provide a means to add community extensions? MODS, for example, allows you to extend the schema using an extension subschema that enables you to add additional metadata elements or schemas, to create a hybrid metadata schema tailored to your needs. XrML, the eXtensible Rights Management Language, provides an extension schema to develop community extensions to the core language.
Different metadata standards serve different purposes, and are employed by different communities. In the highly interoperable world of the Web, it is important to enable your organization to collaborate with others. This collaboration often requires a shared information context, which is provided largely via metadata. For example, an organization that provides streaming video assets for broadcast use may diversify into the educational marketplace and may want to package its assets. RUcore can support any schema, such as MARC, for sharing records with the Rutgers Library Catalog and Dublin Core, for sharing with other digital library initiatives, such as the Networked Library of Digital Theses and Dissertations, an international open access portal to theses and dissertations. One final but critical test for your selected metadata standard is to insure that it maps readily to other metadata standards, to support many different contexts of use for your resources.
But what if you feel that the metadata standard meets some user needs, but not all? Perhaps it describes museum objects really well for curation, management and general discovery, but doesn't really support teachers in finding objects that they can integrate into their curricula?You best solution is to select a flexible, "core" schema, that is developed and maintained by an organization or standards body with a similar mission to yours. This schema should either provide you with the opportunity to extend the schema, as MODS does, or you should extend the schema with your own data elements, but document and standardize your contributions. This way, you can benefit from the efforts of others, as they create mappings to other standards, develop applications, etc. This also enables you to share data seamlessly with other organizations that use the standard.
A standard has emerged in the library community-METS-- the great values of the Metadata Encoding and Transmission Standard (METS). METS provides a standardized framework in the form of an XML wrapper for durably linking the different types of metadata with the digital object. The digital object may have several physical formats-a high resolution digital preservation master and a low resolution web display version, for example. METS links all the metadata and all the physical instances of a single object together into an XML package that can be transported between METS compliant systems. RUcore is METS-based. A Workflow Management System supports the uploading of objects and the creation of every different type of metadata-descriptive, technical, provenance-which is packaged into a METS XML document and sent to the repository for storage and access.
As you have seen, the development of a metadata strategy is a complex and lengthy undertaking. The wealth of metadata standards available has only made metadata implementation more complex and time-consuming, not less. When all the pieces are pulled together, the basic process looks like this: