RUcore - Understanding Metadata

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Understanding Cyberinfrastructure separator

Understanding Metadata

Jump to section

Metadata is defined simply as "data about data."

Metadata is all the information needed to manage, describe, preserve and make information available to users.

Mechanics of Metadata

Metadata consists very simply of data elements, which are structured fields, that are populated by unique information, known as "values" according to rules intended to structure metadata into a standardized format. These rules are collectively known as the metadata schema. An example of a metadata schema is Dublin Core.

A metadata schema is generally defined as an XML (eXtensible Markup Language) schema. XML is a markup language that provides semantic structure for expressing information on the web. XML provides meaningful semantic tags for information. Each XML tag is enclosed in angle brackets <> and must be opened <> and closed </> For example, in XML, a sentence can be meaningfully structured in XML in several ways:

	<article>The</article>
	<noun>cat</noun>
	<preposition>in</preposition>
	<article>the</article>
	<noun>hat</noun>

	<subject>The cat in</subject>
	<predicate>the hat</predicate>

Metadata is generally stored in a database management system, such as MySQL, Oracle®, or Microsoft Access®, which can be displayed and exported in XML syntax.

The standard components of metadata are:

Data Element - atomic unit of meaning (as defined by the user community
Attribute - refines, extends and interprets the data element;
Value - information unique to each data element instance
Constraint - order imposed on the data element's expression for consistency and semantic viability; and
Label - contextual instance of the data element name. How the data element displays on the web for the end user.

As an example, let's examine the data element "creator."

The data element would be creator. A data element may have subelements. In this case, data element might be further divided into the subelements, name and role.

The value for each subelement would be a unique instance of information enclosed within each data element.

For example, the creator of this article could be expressed in XML in the following way:

	<creator>
		<name>Agnew, Grace</name>
		<role>author</role>
	</creator>

As you can see in this example, the data element <creator> has two subelements, <name> and <role>.

A constraint on the expression of the data element <name> is that the <name> value must be expressed as Last name, First name.

The <role> subelement might be used as a dynamic label for creator, so that a metadata record for this document would be expressed in XML in this manner:

	<record>
		<title>Understanding metadata</title>
		<creator>
			<name>Agnew, Grace</name>
			<role>author</role>
		</creator>
	</record>

The record display would include the following labels:

Title: Understanding metadata
Author: Grace Agnew

Data Model

Now that you understand the basic mechanics of metadata-what it looks like and how it is created and displayed, you need to step back and consider the information "ecosystem" in which the metadata operates.

A data model provides the basic relationship between the information you are describing and managing and the information community that represents the predominant users of the data.

Metadata should be designed to support information needs of the user community-today and far into the future. A data model should underlie any decisions about metadata design and creation. Your data model will reflect the nature of the information that you manage, and its use by your information community.

The data model provides for the basic description of information so that it can be effectively discovered and used. A good data model should be:

Understandable by the information manager and the user. A repository can hold terabytes or petabytes of data. The data model shows everyone how the information is organized and interrelated for use.
Context independent. A good data model will support all contexts, those that information creators bring to the repository, which can be layered on top of the data model, and the information contexts of future generations of scholars, which can't even be envisioned. The rapid proliferation of digital information is already revolutionizing research. A good data model should be flexible enough to accommodate major transformations to the research and publication process.
Representation of "living data." Digital data lives in relationship to other data and may be repurposed many times over its lifecycle. For example, a data set may be created in a physical experiment, published as the appendix to a journal article, reused in a computational simulation, all within the space of a few months! A good data model will track the events and interrelationships of data, to expose not only the data for reuse but also the lifecycle and ecology of the data.
Support the management and preservation of data. A data model will not only support the discovery and reuse of information but also its preservation for long-term access.

A data model should be designed to meet the user's core information needs, as identified by the International Federation of Library Associations and Institutions (IFLA) [1]:

Find. Can the user discover the information object?
Identify. Does the description provide enough information so that the user knows what he has found?
Select. Can the user select among two or more competing resources that were retrieved with a search?
Obtain. Can the user obtain the resource?

An event-based data model, such as the underlying data model behind the metadata standard you select must enable users to meet their core information needs. In addition, it is helpful to think about users according to the roles they play-creator (the creators of information and metadata), viewers (those that view the information), evaluators (those that evaluate or critique the information) and repurposers (those who will modify the information to create a derivative or revised information object).

Understanding the Organization's Information Use

To develop a data model, it is most critical to understand how your intended audience wants to find, evaluate and use the information you are describing. It is also critical to understand the needs of those who will create and manage both the information and the metadata, since metadata does not exist in a vacuum but is part of an overall strategy to manage the organization's information, generally within a repository or other information management system.

Begin by answering some basic questions:

Who are your primary users? What are their information needs? How do they discover, share and use information? What other information resources are they most likely to use?
Who will create and manage the information and the metadata? What are their workflow needs?
What is the nature of the information to be described? What formats does it include? What subjects or professional domains are included? For what purpose(s) was the information created?
When will users most often need the information? How often must the information be updated?
How will the information be used by the primary users, the information managers, and any other users, including other organizations.

The first strategy to explore the answers to these questions is to study the organization itself-it's mission and goals, the information products it produces, and its information needs, based on studying the organization's mission statement, website(s) and promotional materials; conducting user surveys; and talking to users in focus groups and interviews. The collections that prompted the need for a metadata strategy should also be surveyed, including any existing metadata. In fact, it is quite useful to survey users about their use of the current metadata or other discovery tools, which may be a formal or informal discovery system. What works for them in the current discovery system? What doesn't work? Once they obtain the information, how do they use it? How does their information use support the organization's mission? It is important to understand the role that information plays in the organization's work to fulfill its mission, both to support that mission and to develop an assessment strategy for the metadata that evaluates the impact the metadata has on user satisfaction and on the organization's work generally.

Next, expand the exploration to evaluate the information strategies organizations similar in mission and purpose are using. In your discussions with users, you asked what other sources of information they predominantly use. What is the metadata strategy employed for those other sources of information? What organizations organize and make available these ancillary sources of information? It is important to co-locate your metadata within the larger information landscape for two reasons: to integrate your metadata collaboratively with other organizations for a "one stop shop" for information and to participate in a larger community so that the expense and expertise required to develop and maintain a metadata schema do not fall exclusively to your organization. An example of a very effective community collaboration is the PBCore metadata standard developed by members of the public broadcasting community, under the sponsorship of the Corporation for Public Broadcasting. [2]

Now that you have identified the information and users specific to your organization, you can create a basic model that demonstrates the nature of the information you are managing and the user needs that it serves. This data model can be a simple diagram showing the relationship between different entities, including creators, managers and users of data. This completes a critical first step in developing a metadata strategy. Understanding your information, your users and how they interrelate is a critical but often neglected first step. The next step involves selecting among the many metadata standards available to meet the requirements of your data model.

The RUcore Data Model

RUCore Data Model

RUcore tracks lifecycle and ecology of the data.

The life cycle of information is the interaction of the information object with place (place of creation, place of storage, etc.) and agent (the person(s) or organization(s) responsible for creating, describing, managing or using the information). When the object interacts with place or agent at a specific point of time, an event in the lifecycle of the information is said to occur. Events provide context for information use.

Consider an artist who creates an impressionistic painting in her twenties, which is purchased and displayed by a museum in Montclair. Two events in the lifecycle of the work would be the creation of the painting and its permanent exhibition in the Montclair museum. Both events need would be documented and discoverable, by different users with different needs.

A researcher looking for works that demonstrate the influence of impressionism on the artist would discover this work. The researcher might be happy with the digital representation, which provides sufficient detail to demonstrate the influence of Monet upon the artist.

A high school art teacher planning a field trip to view the artist's work in situ would also discover this work, and sufficient information about its permanent exhibition to plan the field trip for his students.

Each user finds the information he or she needs by discovering a different "event" in the lifecycle of the data. RUcore, can meet these core information needs without overloading the object with context specific to one type of scholar or one field of study.

Once you have developed the data model-the underlying principle behind the organization of your data, you will be ready to select a baseline metadata standard that conforms to your data model.

Selecting a Metadata Standard

Since the advent of Dublin Core, metadata has become a rapidly proliferating web information tool. Information communities have developed metadata standards, which consist of metadata elements and controlled vocabularies (standardized terminology) expressed as XML schemas. Metadata standards are maintained by a community, often a standards body, and are intended to address a wide range of description needs, including metadata intended to support specific information communities, such as libraries and museums; metadata to support specific needs, such as the rights expression language, ODRL (Open Digital Rights Language) and metadata specific to disciplines, such as Darwin Core, for identifying and documenting species.

The proceeding sections covered the pre-requisite knowledge for selecting a metadata standard: understanding of the data that you need to describe and manage, understanding the information use requirements of your target audience and an ability to read and understand metadata documentation. Now you are ready to actually evaluate metadata standards against the emerging data model for your organization. This section will describe important criteria for evaluating and selecting a metadata standard:

Information Needs of the User

It is important to understand the information needs and practices of your target audience. What legacy metadata systems or standards do they currently use? At what point in their average workflow are they seeking the kind of information that you will be describing? What data elements will work with their workflow context and frame of reference? How can you meet their needs to find, identify, select and obtain in an efficient and understandable manner? While you want to meet the urgent and immediate needs of your target users, you also want to be aware that these needs will change over time and also that digital information inevitably attracts a wider audience than initially envisioned. You want to be sure that your information model is "context-independent"-that it can accommodate the needs of your current audience but also support additional users with different needs. You want to incorporate the current needs of the user but be extensible and flexible beyond those current needs. My recommended approach is to be context or use independent in the underlying metadata and the information model and then to add user context in the creation, search, retrieval and display tools. Unfortunately, many metadata standards are highly context-dependent. It will be interesting to see if they can survive the rapid evolution of user needs and interests.
Standard domain and purpose:

It is important to evaluate metadata standards in the context of their development and use. What community developed the standard and what needs did they intend to address through the standard? MPEG-7, the "Multimedia Content Description Interface" emerged predominantly from the commercial media market. Its data model, structure and design focus on the delivery of digital media assets to end users. While it can be used for other purposes, such as the archiving and management of digital assets, there are limitations in the design that must be worked around to support different uses.

A good way to identify potential metadata standards to employ is to look at the metadata standards used by complementary organizations, by communities that produce information that is also heavily used by your target audience, and by organizations that you might want to collaborate with. Did they develop a metadata standard or an application profile for an existing standard? What metadata standard(s) did they choose, and why? Also, a flexible, context-independent metadata standard may offer the best opportunities for meeting current and future information needs. This is the metadata strategy employed within RUcore.
The Existing Information Environment

The metadata standard selected needs to integrate smoothly with the organization's existing information environment. If you are already describing information using data elements or fields in a database, will they map readily to the metadata standard or schema you have selected?

The existing information environment also consists of the staffing and technology available to implement the metadata. If your organization plans to utilize the services of student labor or volunteers, it will be important to consider a simpler, more streamlined metadata schema.

It is also critical to consider the controlled vocabularies available to populate the data elements within a metadata standard. Many standards specify the use of controlled vocabularies within data elements, such as the requirement to use Internet MIME types within a "format" data element. The use of controlled vocabularies, which provide authoritative terms to choose among, can increase ease of cataloging among the metadata creators as well as increase searching precision for end users.
Flexibility and Extensibility of the Metadata Standard

Metadata standards are generally designed to either respond to the needs of a specific community of users or to serve a broad range of communities and information needs. In either case, the metadata standard as written is unlikely to meet all of your requirements. Over time, your requirements will change as your users' information needs evolve, and as the information you are collecting and managing changes. It is therefore critical to evaluate a metadata standard for its ability to accommodate customization and evolution. Does the metadata standard provide flexibility in adding controlled vocabularies that represent the subject domain and information context of your users? Does the metadata standard provide a means to add community extensions? MODS, for example, allows you to extend the schema using an extension subschema that enables you to add additional metadata elements or schemas, to create a hybrid metadata schema tailored to your needs. XrML, the eXtensible Rights Management Language, provides an extension schema to develop community extensions to the core language.

Different metadata standards serve different purposes, and are employed by different communities. In the highly interoperable world of the Web, it is important to enable your organization to collaborate with others. This collaboration often requires a shared information context, which is provided largely via metadata. For example, an organization that provides streaming video assets for broadcast use may diversify into the educational marketplace and may want to package its assets. RUcore can support any schema, such as MARC, for sharing records with the Rutgers Library Catalog and Dublin Core, for sharing with other digital library initiatives, such as the Networked Library of Digital Theses and Dissertations, an international open access portal to theses and dissertations. One final but critical test for your selected metadata standard is to insure that it maps readily to other metadata standards, to support many different contexts of use for your resources.

Should You Develop Your Own Standard?

When you explore many different metadata standards and compared them against the needs of your users and the requirements of your collection, you probably have identified many gaps where the standards fail to address some of your organization's needs. The temptation to develop your own standard-it doesn't look that hard!-is great. A good metadata standard may look deceptively simple, but generally much thought, considerable effort and many revisions have resulted in the standard you see today. You may want to develop your own standard if the following conditions apply:

Your subject domain or users cannot be served by any existing schemas.
Your need for context is very strong, and a general-purpose schema such as MODS or Dublin Core will not suffice. This is the situation that the public broadcasting community faced, when the need to share information assets between stations as well as to enable discovery by the general public led to the development of PBCore.
You are willing to robustly describe and maintain the metadata standard, establish an XML schema and robustly maintain the schema you have developed. Metadata should always lead to interoperability and integration with the wealth of similar or complementary resources that are also important to your users. Even if you are not promoting the schema for use by others, you want to enable mapping to your schema by others.
Your information resources, or your primary context of use is so unique that no metadata schema can meet your needs, even with extensions.

Generally, a well-constructed and extensible general purpose standard such as MODS can be customized to meet most information needs. It is recommended that you extend an existing standard to meet your organization's needs, so that the rigorous requirements for establishing and maintaining a standard do not fall solely to your organization. Many communities have established application profiles for existing standards, in which a standard is tailored and extended to a specific user group or information format.

But what if you feel that the metadata standard meets some user needs, but not all? Perhaps it describes museum objects really well for curation, management and general discovery, but doesn't really support teachers in finding objects that they can integrate into their curricula?

You best solution is to select a flexible, "core" schema, that is developed and maintained by an organization or standards body with a similar mission to yours. This schema should either provide you with the opportunity to extend the schema, as MODS does, or you should extend the schema with your own data elements, but document and standardize your contributions. This way, you can benefit from the efforts of others, as they create mappings to other standards, develop applications, etc. This also enables you to share data seamlessly with other organizations that use the standard.

What About the Other Kinds of Metadata?

Metadata is not only used to describe digital information but also to preserve and manage it, to insure that as better digital formats emerge, the digital master is migrated appropriately and as technologies become obsolete, the objects they created or displayed are still usable. This overview has looked primarily at descriptive metadata. However, metadata to document the technical characteristics of the object, to track its analog and digital lifecycles and to insure appropriate usage to support intellectual property rights, are emerging for all types of information. Generally, you can expect to use multiple metadata standards for each type of activity, such as discovery, access and management.

A standard has emerged in the library community-METS-- the great values of the Metadata Encoding and Transmission Standard (METS). METS provides a standardized framework in the form of an XML wrapper for durably linking the different types of metadata with the digital object. The digital object may have several physical formats-a high resolution digital preservation master and a low resolution web display version, for example. METS links all the metadata and all the physical instances of a single object together into an XML package that can be transported between METS compliant systems. RUcore is METS-based. A Workflow Management System supports the uploading of objects and the creation of every different type of metadata-descriptive, technical, provenance-which is packaged into a METS XML document and sent to the repository for storage and access.

As you have seen, the development of a metadata strategy is a complex and lengthy undertaking. The wealth of metadata standards available has only made metadata implementation more complex and time-consuming, not less. When all the pieces are pulled together, the basic process looks like this:

The metadata process.

Conclusion

When you think about all the components of a metadata implementation that must be addressed, the understandable temptation is to cut corners, quickly select an all-purpose standard, and worry about all the other issues later. In many years of selecting and applying metadata standards, however, I want to close by assuring you that the work will have to be done-at the beginning, when the mistakes are less costly, or later, when your metadata implementation eventually and inevitably ceases to work effectively or to grow with your organization and your users. Paying careful attention to all the issues, particularly understanding the role your information plays for your organization, will be repaid when your metadata implementation becomes a critical tool for realizing your organizational mission-today and far into the future.

Footnotes

[1] "Section 6.1 Mapping Attributes and Relationships to User Tasks" in International Federation of Library Associations and Institutions, International Federation of Library Associations and Institutions. Functional Requirements for Bibliographic Records: Final Report. Munchen: K.G. Saur, 1998. (UBCIM Publications - New Series, V. 19). Also available as: http://www.ifla.org/VII/s13/frbr/frbr3.htm#6
[2] Corporation for Public Broadcasting. PBCore: Public Broadcasting Metadata Dictionary Project.
http://www.utah.edu/cpbmetadata/

Version 8.5.5