Organize your BIDS folders and physiological recordings#
The following section walks you through the basics of the Brain Imaging Data Structure (BIDS) folder structure, file names and file formats that are used within the BBSIG pipelines to import and export data, namely peripheral physiological data (e.g. ECG, PPG, respiration) stored in a given neuroimaging modality folder (e.g. behavioral, EEG or fMRI). We recommend formatting your data to be BIDS-compliant, according to the steps below (see the official BIDS documentation for more details). This will greatly facilitate the usage of the BBSIG pipelines, requiring minimal adjustments.
Warning
Our pipelines and current documentation have been tested with BIDS v1.9.0 and v1.10.0.
What is BIDS?#
The acronym BIDS stands for Brain Imaging Data Structure - a standardized framework for organizing, describing, and sharing neuroimaging data (such as EEG, MRI, PET, and more) in a uniform way across the scientific community. The BIDS standard provides clear conventions for:
- Structuring folders and organizing data hierarchically
- Naming files consistently with unique file names
- Converting raw data into standard file formats, depending on the neuroimaging modality
Adopting BIDS enhances reproducibility, simplifies data sharing, and improves interoperability across pipelines. Moreover, BIDS is rapidly becoming the standard for sharing datasets in public databases such as OpenNeuro.org. For this reason, the BBSIG team has adopted the BIDS framework to organize and store peripheral psychophysiological data as the starting point for data import and export in each BBSIG pipeline.
Unfamiliar with the BIDS framework? Here is a list of what we found to be the most relevant pages for peripheral physiological data from the BIDS official documentation:
- Common principles - Brain Imaging Data Structure v1.10.0
- Modality agnostic files - Brain Imaging Data Structure v1.10.0
- Physiological recordings - Brain Imaging Data Structure v1.10.0
The BIDS folder structure#
Our BBSIG pipelines are designed to import peripheral physiological data from a compliant BIDS folder structure, which includes separate folders storing raw data for each subject within the root folder of the dataset. Regarding data output, (pre)processed data is stored in the derivatives
folder, with a specific folder for each preprocessing or analysis pipeline (e.g., derivatives/ecg-preproc/
).
Here is an example of how the minimal BIDS folder structure should look in order to run the BBSIG pipelines (note that only the files marked as "required" and their relative directory are imported, the rest is only recommended):
└─ YourBIDSFolder/ # root folder for dataset
├─ sub-<label>/ # raw data for each subject (e.g., `sub-01`)
│ └─ <datatype>/ # neuroimaging modality-specific folder (e.g., `eeg`, `beh`)
│ ├─ sub-<label>_task-<label>_<datatype>.json # recommended: datatype-specific data
│ ├─ sub-<label>_task-<label>_<datatype>.tsv # recommended: datatype-specific metadata
│ ├─ sub-<label>_task-<label>_events.json # recommended: event markers
│ ├─ sub-<label>_task-<label>_events.tsv # recommended: event markers metadata
│ ├─ sub-<label>_task-<label>_physio.json # required: physiological data (e.g., ECG, PPG, RESP)
│ ├─ sub-<label>_task-<label>_physio.tsv.gz # required: physiological metadata
│ └─ ...
├─ sub-<label>/ # additional subjects follow the same structure above
│ └─ <datatype>/
│ └─ ...
├─ derivatives/ # outputs from BBSIG pipelines
│ ├─ ecg-preproc/ # generated by `ecg_preproc.ipynb` pipeline
│ ├─ hrv-analysis/ # generated by `hrv_analysis.ipynb` pipeline
│ └─ ppg-preproc/ # generated by `ppg_preproc.ipynb` pipeline
├─ code/ # analysis scripts, incl. BBSIG pipelines
│ └─ ...
├─ dataset_description.json # recommended: dataset metadata
├─ participants.tsv # recommended: participants demographics
└─ participants.json # recommended: participants demographics metadata
└─ YourBIDSFolder/ # root folder for dataset
├─ sub-<label>/ # raw data for each subject (e.g., `sub-01`)
│ └─ ses-<label>/ # session-specific folder (e.g., `ses-01`)
│ └─ <datatype>/ # neuroimaging modality-specific folder (e.g., `eeg`, `beh`)
│ ├─ sub-<label>_ses-<label>_task-<label>_<datatype>.json # recommended: datatype-specific data
│ ├─ sub-<label>_ses-<label>_task-<label>_<datatype>.tsv # recommended: datatype-specific metadata
│ ├─ sub-<label>_ses-<label>_task-<label>_events.json # recommended: event markers
│ ├─ sub-<label>_ses-<label>_task-<label>_events.tsv # recommended: event markers metadata
│ ├─ sub-<label>_ses-<label>_task-<label>_physio.json # required: physiological data (e.g., ECG, PPG, RESP)
│ ├─ sub-<label>_ses-<label>_task-<label>_physio.tsv.gz # required: physiological metadata
│ └─ ...
├─ sub-<label>/ # additional subjects follow the same structure above
│ └─ ses-<label>/
│ └─ <datatype>/
│ └─ ...
├─ derivatives/ # outputs from BBSIG pipelines
│ ├─ ecg-preproc/ # generated by `ecg_preproc.ipynb` pipeline
│ ├─ hrv-analysis/ # generated by `hrv_analysis.ipynb` pipeline
│ └─ ppg-preproc/ # generated by `ppg_preproc.ipynb` pipeline
├─ code/ # analysis scripts, incl. BBSIG pipelines
│ └─ ...
├─ dataset_description.json # recommended: dataset metadata
├─ participants.tsv # recommended: participants demographics
└─ participants.json # recommended: participants demographics metadata
A short glossary: mandatory and optional BIDS entities
Confused about what datatype
, session
or task
mean? Here is a short glossary of the BIDS entities key-value pairs that have to be specified when importing data in our BBSIG pipelines, divided between mandatory and optional. You can also check the full BIDS Entity Table for more details.
- Mandatory entities:
sub-<label>
: a unique participant or subject in the study, identified by thetask-<label>
: the specific task or activity performed by the participant during data acquisition, identified by thedatatype
: the type of data collected, whether neuroimaging or behavioral (e.g.,beh
,eeg
orfunc
). Check out all of the BIDS-compliant datatype abbreviations.
- Optional entities:
ses-<label>
: a specific session within a study, defined as a distinct time point with a logical grouping of neuroimaging and behavioral data consistent across subjects and identified by the
BIDS allows for additional optional entities, such as acq-<label>
to distinguish between acquisition protocols, run-<index>
to distinguish uninterrupted repetitions of data acquisition, or recording-<label>
to distinguish between recording types (e.g., ppg
or respiration
). Although these are not explicitly specified in our pipelines, they can be easily added to the base file name (bids_base_fname
) as in the lines below:
# If you have additional BIDS entities (e.g., 'run' or 'recording') you can change the bids_base_fname variable accordingly
bids_base_fname = f'{subj_id}_ses-{session_idx}_task-{task_name}_run-{run_idx}_recording-{rec_name}'
Storing BIDS-compliant peripheral physiological data#
Let's have a closer look at the BIDS-compliant file formats for storing peripheral physiological data, including ECG, PPG, and respiration. According to the BIDS standard, continuous recordings of peripheral physiological data may be stored under the respective datatype
folder, meaning the directory for the neuroimaging or behavioral datatype modality alongside which they were acquired (e.g., eeg
or beh
). Peripheral physiological data may be specified using two files:
_physio.tsv.gz
: a compressed tabular file (TSV.GZ) for storing continuous raw data (without a header line)_physio.json
: a sidecar JSON file for storing metadata fields
Peripheral physiological recordings must be labeled using the _physio
suffix, while the related task event timestamps (e.g., stimuli, responses) are stored in files with the _events
suffix.
If multiple peripheral physiological modalities were acquired concurrently with the same sampling frequency, they must be stored in separate columns of the same _physio.tsv.gz
. Otherwise, if recordings with different sampling frequencies have been acquired, they can be distinguished using recording-<label>
.
Physio data: _physio.tsv.gz
file#
The _physio.tsv.gz
file (i.e., a gzip compressed tabular file) should be organized with one column per physiological modality (e.g., ECG, respiration, PPG) and one row per acquired sample, given that all modalities have been acquired at the same sampling rate. Otherwise, they should be stored in separate files under the [_recording-
Here is an example of how the content of a _physio.tsv.gz
file should look (after decompression):
-148.94234 -1484803.4
-162.86522 -1482439.5
-165.02208 -1480325
-162.08112 -1478262.8
-165.2874 -1476238.4
Physio metadata: _physio.json
file#
The _physio.json
file includes general metadata regarding the peripheral physiological recording, such as the sampling frequency (in Hz), the start time in seconds (in relation to the start of acquisition of the first data sample in the corresponding neural dataset), and the column names of the corresponding TSV.GZ file. This latter point is particularly important, as the _physio.tsv.gz
file cannot include any header line / column names, only pure data. Therefore, the BBSIG pipelines crucially depend on this JSON file to extract the relevant column for a given peripheral physiological modality.
In the example above, the physiological data (i.e., the "cardiac"
for ECG, "respiratory"
for RESP) were acquired by devices from the same manufacturer and with the same sampling frequency. Here is an example of how the sidecar _physio.json
file might look:
{
"SamplingFrequency": 1000,
"StartTime": 0.0,
"Columns": [
"cardiac",
"respiratory"
],
"Manufacturer": "Brain Products GmbH",
"ManufacturersModelName": "BrainAmp ExG",
"cardiac": {
"Description": "continuous measurements by ECG electrodes",
"Units": "microVolts",
"TermURL": "https://www.ncbi.nlm.nih.gov/mesh/68004562"
},
"respiratory": {
"Description": "continuous measurements by respiration belt",
"Units": "ARU (Arbitrary Respiratory Unit); 1 mV/ARU"
}
}
The BIDS Validator#
In order to check that your folder structure and files are organized and named correctly for use with the BBSIG pipelines, we recommend running the BIDS Validator - a handy tool that verifies whether your dataset is compliant with the BIDS specification. You can use either the browser version1 or the Command Line version. After selecting the desired folder, the BIDS Validator will return a list of errors (i.e., critical parts of the dataset that are not BIDS compliant) and warnings (i.e., non-critical, recommended improvements) with suggestions for the appropriate corrections.
If your dataset has passed the BIDS Validator, you are ready to start using the BBSIG pipelines without major adjustments!
How to specify BIDS entities in the BBSIG pipelines#
To recap, if you are working with peripheral physiological data, the root directory of your BIDS data should include a series of participant-specific folders, including at least the _physio.tsv.gz
and _physio.json
files, as shown in the example below:
└─ YourBIDSFolder/
├─ sub-101/
│ └─ beh/
│ ├─ sub-101_task-BBSIG_physio.json
│ └─ sub-101_task-BBSIG_physio.tsv.gz
├─ sub-102/
└─ ...
└─ YourBIDSFolder/
├─ sub-101/
│ └─ ses-01/
│ └─ beh/
│ ├─ sub-101_ses-01_task-BBSIG_physio.json
│ └─ sub-101_ses-01_task-BBSIG_physio.tsv.gz
├─ sub-102/
└─ ...
At the beginning of each pipeline you will find a section for specifying mandatory and optional BIDS entities which will be used to create the based BIDS filename and file directory. For example, this is what the beginning of our ECG preprocessing pipeline looks like:
# Define the participant ID
participant_ids = ['101'] # Adjust as needed: it should correspond to <ID> of 'sub-<ID>' in BIDS format
# Specify the main directory of data storage (containing BIDS-compliant raw data)
wd = r'C:\YourBIDSFolder' # change with the directory of data storage
# Mandatory: BIDS entities (task, datatype)
task_name = 'BBSIG' # <label> of 'task-<label>' used for file naming in BIDS format
datatype_name = 'beh' # datatype used for corresponding directory in BIDS format (e.g., 'beh', 'eeg', 'func')
physio_name = 'physio' # physio data specification in BIDS format
# Optional: BIDS entities (session)
session_idx = '1' # <label> of 'ses-<label>' in BIDS format, if available; otherwise, set to None
'run-<label>'
or 'recording-<label>'
), you can easily add them by changing the corresponding bids_base_fname
variable accordingly, as this will only impact file naming and not folder structure. This base BIDS filename will be inherited by all data import and export functions. Here is an example of how this line of code could look:
# If you have additional BIDS entities (e.g., 'run' or 'recording') you can change the bids_base_fname variable accordingly
bids_base_fname = f'{subj_id}_ses-{session_idx}_task-{task_name}_run-{run_idx}_recording-{rec_name}'
If you have a clear idea of what your sub-<label>
, task-<label>
, datatype
abbreviation, and optionally ses-<label>
are, the pipeline will work out of the box and will take care of importing and exporting data from the correct folder locations.
-
The browser version of the BIDS Validator only works with Chrome and Firefox. Because this version is browser-based, it does not pose a risk for data privacy and confidentiality, as files are not uploaded to a server in order to validate your BIDS data. ↩