Data Management and RUresearch Presentation by Ryan Womack
Free Assistance and Advice
We will assist you with preparing grant proposals and designing your data strategy. The RU
research team consists of experienced digital information professionals who work with data, write and manage grants and serve as peer reviewers and consultants for granting agencies, including the National Science Foundation.
We can offer data management advice in:
- Identifying your data model. What data are you capturing and how does it interact with other data in your research environment?
- Designing a metadata strategy. How can I describe my data to ensure that colleagues in my field and those in other disciplines can find and reuse the data?
- Capturing your data. What is the best methodology for capturing my data, for further analysis and for sharing with others? Is this a spreadsheet, a database, an XML document, or something else? If my community already has a data format, how can I readily transfer the data I collect to that format?
- Making your data discoverable in your search portal. RUcore offers a search portal to your data that can be easily incorporated into your project website. As an example, see the Video Mosaic Collaborative and the Equine Science Center collection. We can help you select the right data elements and record displays to ensure that end users can find and use your research.
Customized Search and Retrieval Portal
Finding resources that meet your information needs depends on the metadata, or descriptive information, used. This metadata should reflect the terminology of your field as well as information that is meaningful to enable you to find and select the best information for your needs. Learn more about metadata.
At the same time, metadata should be standardized, consistent and enable your data to be shared with a global, multidisciplinary audience.
RUcore employs a sophisticated, flexible metadata strategy that can customize metadata to support your primary audience yet still be compatible with prevailing metadata standards. For an example, see the Video Mosaic Collaborative, an NSF-funded mathematics education video collection. Metadata is customized to reflect mathematics education practices and to support core audiences of mathematics education faculty, researchers and practicing teachers.>
RUresearch incudes a portal application that enables you to select metadata elements to filter a search and to display in search results. The search and retrieval portal is easily incorporated in your project's website using a technology known as "iFrames."
Ongoing Management and Support for Your Data
RUcore, which includes RUresearch, is an important, core service for the Rutgers University Libraries. Many library faculty and staff are engaged in its support. The Rutgers University Libraries are recognized leaders among their peer institutions in digital repository development and have contributed significant open source software to the field. We are committed to the long time persistence and availability of your data and are continuously developing new tools and services as well as upgrades to the RUcore platform to manage the digital resources we support. You can be confident in the long-term sustainability of data deposited in RUcore.
Is there a fee for placing my data in RUresearch?
accepts all types of resources that represent the significant intellectual output of the university. This includes faculty journal articles and other scholarly publications, theses and dissertations for degrees awarded by Rutgers University, and resources such as data sets that result from the research process. Individual resources, such as individual data sets that involve simple cataloging and storage, such as the example data sets currently available in the RU
research portal, can be accepted at no cost. The same is true for electronic journal article preprints and post prints.
The Library will consult on your data management plan or grant at no cost, but managing data for a large research project , such as projects generally funded by grants, involves significant work and planning that will generally require a fee for service. The services we offer include customizing metadata and providing both ongoing cataloging and storage and management of data and associated documents and software. This fee can be accommodated through cost recovery charges in the grant budget, either as a data management fee or through the involvement of library faculty and staff as co-P.I.s or researchers on the grant, with associated line item cost recovery. This will be a one-time, cost recovery only fee that can be incorporated into the grant proposal budget. Data will be preserved and made accessible for the long term at no additional cost to the project beyond the one-time initial cost. However, that initial cost, although negotiable, will be based on the amount of work and effort anticipated for the life of the project.
Data Curation Research Center
The Rutgers University Libraries' RUcore
initiative includes a Data Curation Research Center and a Data Curator
who participates actively in digital preservation research and development. The Rutgers University Libraries are internationally recognized as being on the forefront for digital preservation standards and practices, particularly for digital video. We currently employ "industry best practices" for digital file preservation, including:
- Multiple backups and restoration practices, including online, nearline, off line and offsite storage of files.
- Continuous file integrity checks, such as checksum assignment and checking
- Persistent identifiers that use metadata to continuously locate a file, even if it is moved during routine storage reallocation. When you reference a citation URL, you can be confident that the file will be retrieved.
- Storage of files in multiple formats. One or more canonical formats that are vendor independent and conform to non-proprietary standards are employed whenever feasible. The original file format is also always maintained. We are currently transcoding most numeric data sets to comma separated values (CSV) format. We are also currently investigating XML (eXtensible Markup Language) and RDF (Resource Description Framework) for web based canonical formats, as well as community specific data standards such as the DDI (Data Documentation Initiative) for social science and survey data, and SensorML for sensor data. If your community uses a specific data storage format, we will explore its use with you.
Learn more about preservation
Metadata is simply "data about data." Metadata helps the data owner organize and manage the data he or she creates. It's primary role, however, is to make sure that data can be discovered and reused by others. Well-designed metadata should support four core user needs, known by the acronym "FISO", for "find," "identify," "select" and "obtain."
- Can the user FIND the information he is seeking?
- Can the user IDENTIFY what she has found? E.g., if the user is looking for a video, does the metadata record clearly indicate that the described resource is a video?
- Can the user SELECT the most appropriate resource, when several are retrieved, based on the metadata records. E.g., if the user is looking for air quality sampling that measures nitrous oxide levels, can he determine which resource among many air pollution data sets includes nitrous oxide sampling?
- Can the user OBTAIN the resource quickly and easily from the metadata record?
Good metadata is responsive to the information needs of its user community. It captures the information most important for that community, using terminology that is accurate, current and meaningful to that community. It also needs to be consistently applied and shareable with a broader community. Metadata standards evolved to enable consistency and broader sharing of information. One of the oldest and most famous is Dublin Core, a 15-element metadata standard that is widely used. Many research communities have evolved their own standards, such as Darwin Core for biological specimens and DDI (data documentation initiative) for survey-based data. RUcore employs a very flexible, sophisticated event-based metadata implementation that supports many different metadata standards but is largely independent of any one standard. We can display and export records in many different standards, including the standard your community uses. We can also design customized metadata that can support many standards or serve as a community standard, specific to your project's needs.
Learn more about metadata
RUcore can assist you with controlling access to your data. Creator(s) of data own the copyright to that data. The rights holder has the right to determine access and reuse of the data. Because of this, you will need to provide RUcore with a non-exclusive license to manage your data and make it available for others to use. We currently offer two methods of access control. We will work with you on a rights statement explaining what use others may make of your data. This provides important information for end users about reuse of your data. We can also embargo your data for a time period of your choosing. Metadata about your data will appear in the portal, but an embargo note will indicate that the data is not currently available for re-use. This will raise scholarly awareness of your data, so that others will not duplicate your work unnecessarily or will contact you directly for further information about your data and its availability.
The research process is a complex ecosystem of information. Research frequently begins with a grant proposal or other methodology for establishing a hypothesis and a research proposal. Data is collected, and throughout the collection process, documents are created, such as lab or experiment notes, survey questionnaires, images, video, etc. Instruments such as sensors, particulate collectors, telescopes, microscopes are used to collect and analyze data. Maintenance or calibration records for those instruments can be important when collection practices must be justified or explained. Specific software code may be written to process and analyze the data. At the end of the research process, peer reviewed publications share the conclusions and extensively reference the research data. From conception to collection to publication, the entire research process produces valuable data that should be collected and made available for others.
Capturing this associated data is more complex than storing and providing access to the data itself. Capturing the entire life cycle or information ecosystem of a data set is a three step process.
Step One – the Infrastructure
The first step is to build relationships between data objects within the repository architecture. RUresearch leverages the Fedora open source repository architecture by creating a compound object that pulls together the data, associated documents, such as code books, lab notes, images, associated software (such as MATLAB, R, or SAS scripts) intended to support the data, and instrumentation records, such as maintenance or calibration records. Resources that are dependent on the data for meaning and do not stand alone are included in the compound object for the project. Associated objects that are separately cataloged and may have different or additional creators, such as analyses, articles, books and presentations, are also related to the compound object. The repository infrastructure provides the groundwork for providing meaningful context
Step Two - Capturing Associated Objects
Associated objects are captured in digital form and stored in the repository as part of the compound digital object for the project. They are preserved along with the data so all the effort is involved in the front end—capturing and uploading—but ongoing management is very light.
Step Three – Providing Context
There are many different objects that can be captured, at any stage of the research process. Organizing these and making them available in a meaningful way is deceptively complex. Think of your own most recent research project. Can you immediately locate your IRB protocol if a question comes up about making personally identifiable data accessible? Can you find the lab notes created by the student who graduated two years ago took—the ones you kept because his insights were so valuable? Assuming you can find those notes, do you remember which of the three graduate students that year actually took the insightful notes? Do you have documentation of his permission to share those notes, and can you credit him for their creation? Valuable information often gets lost or discarded because it is just too hard to manage it all and remember the context of its creation. Even when you remember the context, you may not want to see it all the time, or to share all context with everyone. It's important to you that you have publicity releases for all the graduate students who appear in your video but sharing those releases with the world at large violates the privacy you are at pains to protect. RUcore's answer lies in its innovative and unique data model and metadata implementation. RUcore uses a metadata implementation that captures information useful for finding information but also information useful for managing information. Rights metadata, one of the types of information collected, is largely kept hidden from the general user. Documents associated with rights, such as publicity releases or IRB protocols, are not available for public display but are available to RUcore administrators and will soon be available to collection owners. The context surrounding any information, such as a research project, is situated in place and time. Separate objects can also have separate access controls for availability to different audiences, allowing for both public-use and restricted use versions of data which may contain sensitive information. RUcore uses metadata "events" to document the "who, what, when, and where" of context about research and its supplementary materials.
Rights Event Example
Equine Science Center Videos
Type Permission of License
Date Time 2009-12-09
Name Elyse Conway
Type Publicity release
Name Model release
Data Life Cycle Event Example
Data Life Cycle Event
Type Related publication
Label Article references the data set, Major League Baseball and Performance Data, 1986
Name Pazzani, Michael J. and Bay, Stephen D. (1999) The Independent Sign Bias: Gaining Insight from Multiple Linear Regression in Proceedings of the Twenty First Annual Conference of the Cognitive Science Society.
Data is central
The data itself remains central for discovery and use. RUresearch offers flexible portals that can configure metadata displays to show different levels of context. Creators of data may need different information than users. Users in the primary research domain may need different information than users in a broader multidisciplinary context. The libraries will work with researchers to present information about research data in ways that are meaningful and clear.