Skip to content

Research Folder Structure Standard - TONIC

A template can be cloned or forked from the TAM GitLabTM. You'll find a template for organizing multiple projects in one repository (link) or for organizing a single project in one repository (link). The folder names should be self-explanatory but if you have any questions on how to use the template, please consult the TONIC website or your local Data Stewards. If you don't know how to use Git and GitLab yet, we kindly refer to the Git and GitLabTM tutorial.

The folder structure for a single project looks like this:

.
├── 01_project_management
│   ├── 01_data_management_plans
│   └── 02_preregistration
├── 02_material_and_methods
│   ├── 01_study-protocol
│   ├── 02_code
├── 03_data
├── 04_figures
├── 05_dissemination
│   ├── 01_reports_conferences
│   ├── 02_manuscripts
│   ├── 03_other
├── LICENSE-CC-BY
└── README.md

Adddtionally, in each of the subfolders you should have a dedicated README file to explain the content and structure of the subfolder. Of course, you can always add more subfolders for your needs, as long as you follow the naming convention and document it in a README.

How to Utilize TONIC for Your Data Management Needs

As mentioned in the previous page, already in the stage of data collection, organization and storage you should think about how your data will be used by you and/or others and how you want to share your data later on. Below we give multiple examples for different data management needs.

Case 1: one dataset being used in only one research project

Data Storage

If a dataset is used in only one research project, you can simply keep this dataset in all its stages (source, raw, derivative) inside your 03_data folder (we explain in the next section how the structure inside the 03_data folder should look like, aka BIDS).

Data Sharing

When you get to the stage where you want to publish your work, you should consider publishing your raw and derivative data separately. Reason is, as mentioned on the previous page, that other researchers usually want to reuse raw data for their own analyses. For this you can simply create two git branches, one that holds only the raw data from your repository, and one that holds the rest of the repository that you want to publish (how to work with git branches you can learn in the Git and GitLabTM Tutorial).

During the submission process in our DataHub Repository you can submit them together under one DOI or separately and link the publication of the raw dataset and the publication of your project through the metadata field "Relations and References" in the submission interface. In the case of the latter, you facilitate the reuse of your raw dataset better.

Case 2: one dataset being used across multiple research projects

Data Storage

If a dataset is used across multiple and more or less separate research projects, you should keep the dataset in its own dedicated repository. Meaning, you will have

  1. your project repository following the TONIC standard for your project related files. Inside 03_data in your project repository you will store only the derivatives that are associated with your unique project
  2. and a data repository that holds the raw dataset following the BIDS standard.1

Background for this is that every time someone clones a GitLabTM repository, all contents of the repository is cloned. Therefore, if a multi-user dataset is kept inside one person's project repository, all users of this dataset also clone the rest of the owners project repository. Of course, you could work with branches in this case too, but this makes it more complicated for everyone. If everyone working with the dataset is an experienced git user and wishes to rather work with branches instead having separate repositories, however, you are of course welcome to do so.

Data Sharing

During the submission process in our DataHub Repository you can submit them together under one DOI or separately and link the publication of the raw dataset and the publication of the different projects through the metadata field "Relations and References" in the submission interface. In the case of the latter, you facilitate the reuse of your raw dataset better and you avoid that the raw dataset gets published more than once, which is unnecassary.

Case 3: one large, heterogeneous dataset being used across multiple research projects

Data Storage

If you have a very large dataset which holds heterogeneous data that is used in multiple research projects in also heterogenous ways, it makes sense to split the dataset in multiple repositories. One example would be if you have a dataset that holds multiple modalities from the same subjects (such as MRI data, EEG data, and eye tracking data) and in some research projects only one of the modalities is used. Or, if you have multiple groups of subjects (such as different patient groups and healthy controls) and in some research projects only one of the subject groups is used. Or, both cases combined. In this case you want to make data consumption as flexible as possible. You can achieve this by not only making use of separate repositories but also by making use of groups. The structure could look like this:

Figure: Managing heterogeneous datasets on GitLab<sup>TM</sup> Figure: One example for managing heterogeneous datasets in GitLabTM groups and repositories.

This way you assure that everyone can flexibly clone the dataset they need and avoid too much data management effort on the data user side.

Data Sharing

During the submission process in our DataHub Repository you can submit data from different repositories together under one DOI or separately and link the publication of the datasets and publications of projects through the metadata field "Relations and References" in the submission interface. In the case of the latter, you facilitate the reuse of your raw dataset better and you avoid that the datasets get published more than once, which is unnecessary.

Consulting on Data Management Needs

Of course, your local Data Stewards are always happy to consult you on your unique data management workflow.


  1. In some cases it is also be beneficial to separate source and raw data in different repositories. You can consult the local Data Stewards if this makes sense in your case.