Skip to content

Data and Code Management

How to Organize and Describe Data and Code?

As reasoned in the introduction and the architecture of the DataHub and in line with the TAM Policy, some conventions are made to advance towards "FAIR" data and to achieve more sustainable and reproducible science. Key aspects are: - Describing data with rich metadata, ideally using standardized schemata (like BIDS). - Applying (domain-) standards to organize data and code (like BIDS). - Aggregating or linking research products and everything needed to reproduce them into modular / self-contained units. - Using open and wide-spread file formats whenever possible.

These things are usually not easy to solve and often common principles have to be tailored / integrated to satisfy the needs of a specific project. You are very welcome to contact your local Data Stewards to find custom solutions for your project.

Brain Imaging Data Structure (BIDS)

Concerning data organization and annotation, the TAM Policy states to use the BIDS standard (see BIDS Starter Kit) whenever possible and as early as reasonable in the succession of data processing. While BIDS started out as a standard in neuro-imaging, it supports more and more modalities by adding extensions.

On the one hand, BIDS provides fixed conventions how to organize data according to processing state, data subjects, sessions and modalities. It describes how detailed metadata can be included in a way that is actionable to both humans and machines. This creates the ground on which standardized code / software can be developed and uses by everyone and reduces the need for adaptations. On the other hand, its common principles and modality agnostic files are flexible enough to include rich custom meta-data on various levels.

But also if there is not yet an extension (or just an extension proposal) for a specific modality, BIDS bases on well-founded common principles which are at least worth taking a look at, if you are designing your data organization. You can always organize and annotate your data in a "bidsy" way, even if there is no specification for your data type, yet. Your local Data Stewards are happy to give you an introduction to BIDS as it became very comprehensive over the last years and the start can be a bit overwhelming.

TONIC research folder structure

We kindly ask you to use a certain folder structure called TONIC for project (top-level) repository structure. A template can be forked from within the TAM GitLabTM. Here you'll find a template for organizing multiple projects in one repository (link) or for organizing a single project (link). The folder names should be self-explanatory but if you have any questions on how to use the template, please consult your local Data Stewards.

Data Management Workflow and Tutorials

The above mentioned structures and standards are meant to be used within the TAM GitLabTM during active project development. Detailed instructions and explanations on how to do that can be found in the sections DataHub Workflow and in the soon available Tutorial Videos.

Publication

For data and code publication with persistent Identifiers (DOI), we are currently establishing the TAM DataHub Repository, based on the modern and widely used DSpace software. This research data repository further simplifies early sharing of data and code and is tailored to the needs of researchers in the field of psychology and neuroscience.

Our TAM DataHub Repository will fill the gap between our institutional repositories like data_UMR, JLUdata and TUdatalib, general research repositories like OSF or Zenodo, and discipline-specific repositories such as OpenNeuro or ZPID.

You are free to continue using such repositories in addition to our DataHub Repository. Additionally, the Marburg University GitLabTM instance may be used to publish finalized git repos with non-sensitive data and code. You are welcome to contact your local Data Stewards to select suitable publication platforms.

TAM Acknowledgement and Publications List

We kindly ask all researchers affiliated with TAM and all collaborating researchers who use our DataHub services to add the following acknowledgement to their publications:

This work was supported by the Deutsche Forschungsgemeinschaft (German Research Foundation, DFG) under Germany’s Excellence Strategy (EXC 3066/1 “The Adaptive Mind”, Project No. 533717223).

Please also remember to add your publication to the internal publication list or contact the TAM Research Coordinator Filipp Schmidt to ask for your publication to be added.

Data Security and Protection

Preventing loss of data

Working with valuable data and code, it is important to have a suitable backup and recovery strategy. Using the DataHub, you have access to different storage locations or platforms, which differ in the availability and actual implementation of backups. An overview can be found below:

Table 1: Availability of backups across different storage locations used in the DataHub (updated 01.08.2024)

Platform / location Backup Notes
TAM GitLabTM yes application backup*
TAM DataHub Repository yes application backup*
MaRC: /home yes regular over-night backups
MaRC: /scratch_shared no file-system for temporary use
MaSC: /masc_shared no

Warning

* An application backup is not to intended to restore individual user data! It protects against a general system failure.

Using git, you already have the tools to visit earlier states of your work within a given repository / dataset. So be very careful when deleting whole repositories or modifying their version record.

In cases where backups are not available for the respective storage location, users need to care themselves for regular backups (e.g. 3-2-1 rule).