Research Data Management Systems: A Beginner’s Guide to Organizing, Preserving, and Sharing Research Data

Updated on
6 min read

Research data management (RDM) is a critical aspect of conducting research, ensuring that data is organized, documented, securely stored, protected, and reusable throughout the project lifecycle. This guide is tailored for students, early-career researchers, research support staff, and developers new to RDM. You’ll learn what RDM systems entail, the key features to consider, how to select and implement an RDM system for your lab, and practical best practices for storing, documenting, and sharing your research data.

What is a Research Data Management (RDM) System?

An RDM system goes beyond mere data storage. It encompasses various components and services that ensure data is discoverable, reusable, and secure. Core components of an RDM system include:

  • Storage: Long-lasting file storage that can scale according to your needs.
  • Metadata and Catalogue: Records that describe datasets for easy findability and comprehension.
  • Access Control: Mechanisms for authentication, authorization, embargoes, and sharing rules.
  • Versioning and Provenance: Methods to track changes and the history of data processing.
  • Persistent Identifiers: Unique identifiers like DOIs to enable dataset citation.
  • Preservation Actions: Strategies for conducting fixity checks and implementing migration plans to prevent data deterioration.
  • Search and APIs: Facilities for programmatic access and discovery of datasets.

Types of RDM systems include institutional repositories, disciplinary repositories, general-purpose archives, and lightweight lab systems. Unlike generic storage solutions or version control systems, RDM systems combine robust storage with effective stewardship capabilities tailored for long-term data reuse.

Why Research Data Management Matters

RDM offers significant benefits for researchers and institutions:

  • Risk Reduction: Minimize data loss through robust backup strategies.
  • Enhanced Collaboration: Promote consistent naming conventions, metadata usage, and access controls to facilitate teamwork.
  • Increased Visibility: Boost citations and visibility of datasets through DOI minting.

Moreover, many funding organizations and journals necessitate a Data Management Plan (DMP) and expect a clear strategy for data sharing and preservation. Compliance with these requirements is crucial to avoid delays in funding or publication.

Embracing FAIR principles (Findable, Accessible, Interoperable, Reusable) is vital for effective RDM, ensuring that datasets remain valuable over time. You can learn more about the FAIR principles here.

Core Features to Look for in an RDM System

To choose an effective RDM platform, focus on its technical and policy features, which include:

  1. Storage and Backup Options: Choose between on-premises, cloud, or hybrid solutions based on your capacity and compliance needs.
  2. Metadata and Cataloguing: Ensure support for standard metadata schemas and guided intake processes.
  3. Access Control: Seek fine-grained permissions and group management capabilities.
  4. Versioning and Provenance Tracking: Support for tracking dataset versions and documenting processing steps.
  5. Persistent Identifiers: Look for platforms that enable minting of DOIs for your datasets.
  6. Search, Discovery, and APIs: Prioritize systems with excellent search capabilities and programmatic access.
  7. Preservation and File Format Support: Ensure the system supports fixity checks and common community formats.
  8. Integration with Other Systems: Explore integration options with platforms like ORCID and GitHub.
PlatformTypeOpen-sourceHosted optionDOI mintingBest for
DataverseDisciplinary/GeneralYesYes (hosted instances available)YesResearch projects needing API integration.
InvenioRDMInstitutional/RepositoryYesDeployableYesLarge repositories and customized solutions.
DSpaceInstitutional RepositoryYesYesPossibleLong-term archiving in university settings.
ZenodoGeneral ArchiveNoHosted by CERNYesQuick dataset archiving for small teams.

How to Choose the Right RDM System for Your Project or Lab

To select the best RDM system, assess the following:

  • Data Volume and Growth Rate: Consider the amount of data you anticipate collecting.
  • File Types and Formats: Account for diverse data including large binary files and sensitive information.
  • Team Size and Distribution: Determine how many team members will need access.
  • Budget and Technical Support: Evaluate available budget versus required support.
  • Compliance Requirements: Ensure compliance with regulations like GDPR or HIPAA.

Options may include open-source solutions like Dataverse and DSpace, commercial platforms, or institutional repositories. Evaluating community support and maturity is essential for mitigating future risks.

Implementing an RDM System: Practical Steps for Beginners

To effectively implement an RDM system, begin with a pilot:

  1. Pilot Project: Select a limited project with a small team to test the system for 2-3 months and refine processes.
  2. Create a Data Management Plan (DMP): Document how you’ll collect, store, share, and preserve data using tools like DMPTool or DMPonline.
  3. Define Roles: Assign responsibilities (e.g., Principal Investigator, Data Steward, Technical Owner).
  4. Design Folder Structures: Maintain a consistent and easily navigable folder structure.
  5. Collect Practical Minimum Metadata: Ensure minimum data for ingestion, such as title, contact information, and access conditions.
  6. Automate Ingestion: Utilize APIs to streamline the data management process.
  7. Regular Backups and Reviews: Implement automated backups following the 3-2-1 backup principle and regularly review archived datasets.
  8. Provide Onboarding and Documentation: Offer training sessions and clear guides for system use.

Best Practices and Common Pitfalls

Successful RDM hinges on:

  • Elevating metadata to a priority status.
  • Automating data ingestion and validation processes.
  • Documenting workflows to support long-term data stewardship.

Avoid pitfalls such as selecting systems solely based on cost, neglecting metadata, or inadequately planning for preservation.

Integration, Interoperability, and Automation

Select systems that support APIs and standard protocols. Look for automation opportunities to streamline workflows, such as integrating CI pipelines for data preprocessing and implementing scheduled fixity checks.

Step-by-Step Roadmap: Setting Up RDM for a Small Lab

  1. Plan and Set Goals (Weeks 0–2): Determine scope, budget, and metadata policies.
  2. Pilot and Policy Development (Month 1–3): Execute pilot tests and draft a concise RDM policy.
  3. Deploy and Integrate (Month 3–6): Expand the system’s implementation to the whole lab, ensuring integration with other systems.
  4. Train and Monitor (Ongoing): Continuous training and monitoring of metadata and process quality.

Tools and Resources

  • Discover the features of Dataverse for RDM.
  • Reach out to institutions like the Digital Curation Centre for guidance.
  • Use DMPTool for generating compliant DMPs.

Conclusion

In summary, effective research data management is achievable with a structured approach: start with a data management plan, pilot an RDM system, and prioritize metadata practices and backups. Consult your institutional resources for additional support and guidance to ensure a successful transition into RDM.

TBO Editorial

About the Author

TBO Editorial writes about the latest updates about products and services related to Technology, Business, Finance & Lifestyle. Do get in touch if you want to share any useful article with our community.