Skip to content

Brain Imaging Data Structure - BIDS

Your measurement data should be organized and annotated accoring to the Brain Imaging Data Structure (BIDS). BIDS has emerged as a powerful standard for describing and sharing neuroscientific data by providing a structured framework for storing and describing these datasets. BIDS creates a shared language in terms of data, thus facilitating data sharing, reducing replication issues, and encouraging the development of automated data analysis tools.

BIDS...

  • modularizes data
  • specifies a folder structure
  • names files in a human AND machine friendly way
  • uses standard interoperable file formats
  • documents metadata
  • minimizes duplication (inheritance principles)
  • adheres to the FAIR principles

Figure: BIDS data structure Figure: MRI data example for BIDS.

Main principles

Modularization - sourcedata, raw data, derivatives

The first important thing to understand about BIDS is what BIDS understands by sourcedata vs. raw data vs. derivatives. This is the distinction BIDS does based on the state of the data in terms of data processing.

Sourcedata describes the very first state of the directly after its acquisition by measurement device. This data type usually cannot be used without some sort of transformation, e.g., at least a file format conversion. Examples: dicom for MRI; edf for Eyelink-Eye Tracking.

Raw data describes unprocessed or minimally processed due to file format conversion data. It can easily be opened by non-proprietary software and is ready to be (pre-)processed. Examples: nifit for MRI; extracted eye coordinates or timestamps for Eye Tracking.

Derivatives describes any output from analyses run on the raw data (except from conversion processes such as dicom to nifti). This can be output from statistical analyses, figures etc.

Figure: BIDS modularization Figure: Modularization principle explained by pasta. On the left picture we see the source of the pasta dish (whole vegetables, flour to make pasta). In the middle we have the ingredients in a state that we can actually use them for cooking. On the right picture we see the final pasta dish. Image by Rémi Gau at 10.5281/zenodo.5872274.

Accordingly, if you want to keep all of these different stages in one dataset or one repository, your dataset would look like this:

└ ─ my_dataset-1/ 
    ├ ─ sourcedata 
    │ ├ ─ sub-01/ 
    │ ├ ─ sub-02/ 
    │ └ ─ ... 
    ├ ─ ... 
    ├ ─ rawdata/ 
    │ ├ ─ dataset_description.json 
    │ ├ ─ participants.tsv 
    │ ├ ─ sub-01/ 
    │ ├ ─ sub-02/ 
    │ └ ─ ... 
    └ ─ derivatives/ 
        ├ ─ pipeline_1/ 
        ├ ─ pipeline_2/ 
        └ ─ ... 

As mentioned in the previous section, depending on the usage of data (single vs. multiple researcher, homogenous vs. heterogeneous dataset), you could split this in multiple repositories. If you do this, be aware that you will need the dataset-metadata files (participants.tsv, dataset_description.json, README) in each of the repositories in order to not loose information and links.

BIDS mostly only has specifications for the raw data but specifications for derivatives are coming more and more. Folder- and file- structure within for the raw data looks as follows:

  rawdata/ 
       dataset_description.json  #dataset metadata
       participants.tsv          #dataset metadata
       sub-01/                   #subject folder
       anat /                    #data type folder
        sub-01_t1w.nii.gz       #data file
       func /                    #data type folder
        sub-01_task_bold.nii.gz #data file
        sub-01_task_bold.json   #metadata file
        sub-01_task_events.tsv  #events data file
       sub-02/                   #data type folder
       ... 

Tabular files

All tabular files are stored as tab-seperated-values (.tsv or if the file is big as .tsv.gz). The .tsv files should include a header, written in snake_case. If you have a .tsv.gz the columns are without headers and the headers are described in the sidecar .json file (= metadata file).

Metadata

Metadata are stored in .json and .tsv files. These files are language-agnostic, meaning you can work with them in, for example: Python, Matlab, R and basically any text-editor. JSON stands for JavaScript Object Notation and as its name indicates takes its syntax from the JavaScript language. It has the following structure:

{
    "key": "value",
    "key2": "value2",
    "key3": {
        "subkey1": "subvalue1"
    }
}

When writing in .json, we reccomend to use an IDE as .json is sensitive to its structure and the file will not be recognized as .json if indentation is wrong.

Filenames

A filename consists of a chain of entity instances and a suffix all separated by underscores, and an extension. This pattern forms filenames that are both human- and machine-readable. For instance, file sub-01_task-rest_eeg.edf contains instances of the subject and task entities, making it evident from the filename alone that it contains resting-state data from subject 01; the suffix eeg and extension .edf depend on the imaging modality and the data format, and can therefore convey further details of the file's contents.

Get started on BIDS

Converting your data to BIDS is an ongoing effort. It is not advisable to only convert your data to BIDS shortly before publication of the dataset. BIDS is meant to help you with managing your data throughout the active development of the project. This means, every time you collect new data you should immediately convert your data to the BIDS format and save the accompanying metadata. This ensures that your project stays fully reproducible because your analysis code is written to read in your data in BIDS format.

The best way to get started with BIDS is to read the specification for your data modality and then go through the BIDS starter kit to get some practical experience with BIDS and also with converter tools. Because here is the good news: You don't actually have to do all of this manually but for many modalities there are great tools available that help you with converting your sourcedata to BIDS, rename your files, run analyses etc. For some kinds of data no BIDS-converter has been built yet. In this case you need to convert your data manually.

Note: We know that the BIDS documentation is very comprehensive and that getting started with BIDS can be a bit overwhelming. Please don't hesitate to reach out to your Data Stewards in order to get support on this matter.

ezBIDS

On an even better note: We are currently working on providing ezBIDS as a service based our own infrastructure. ezBIDS is a software with an online interface that requires neither coding proficiency nor knowledge of BIDS in order to get started. It can convert multiple subjects at once, it takes care of creating modality agnostic files (such as participants.tsvor dataset_description) and it handles task events conversion. It guides you with visual feedback, prompts through the conversion process and makes sure that no required metadata is missing. In the end, it automatically runs the BIDS validator for you. Lastly, it also offers defacing for brain images.

ezBIDS was developed under brainlife.io, an open-source, free and secure reproducible neuroscience analysis platform. We are adopting this software and run it on our own servers, thus your data stays safe and secure within our infrastructure.

Please do not use the open available ezBIDS deployment on brainlife.io, as your data will be uploaded to a north-american server. Be patient and wait for our own deployment :-) We will announce it when the time comes!

Read the ezBIDS documentation and tutorial and/or watch this short tutorial video.

Now that you know how to structure and handle your project, let's learn some Git and GitLabTM!