Skip to content

High performance compute at MaRC3

The "Marburger Compute Cluster 3" (MaRC3), consists of combined compute and storage capacities of multiple local working groups and consortia. This enables synergistic operation and efficient usage of the resources. The MaRC3 is jointly operated by the Center for Synthetic Microbiology (synmikro) and the University Computer Center (HRZ) Marburg.

This page summarizes some key aspects of the HPC user documentation. Detailed and up-to-date information on the MaRC3 cluster and on the Marburg storage cluster (MaSC) can be found at the original HPC user documentation (log-in required with Marburg University staff account). An introduction to using the MaRC3a can also be found in the workflow section of this manual.

Overview and access

There are different ways to access the MaRC3 cluster, including direct access via terminal and SSH connection or via browser and Jupyter_HPC. The general scheme is illustrated below.

Schematic cluster stucture.
Figure: Schematic cluster stucture (user-side view).

  • Access to MaRC3 generally requires a Marburg University staff account and a separate registration for HPC usage. For further information see the getting access section.
  • Make sure you are connected to the network of Marburg University directly or via VPN.
  • For first usage, you need to change your user shell (the specific command line interface) to "bash" using this online form (2FA required).

In case you connect via terminal:

  • Start a terminal session and establish an SSH connection to MaRC3 by entering:
    ssh -p 223 USERNAME@marc3a.hrz.uni-marburg.de
    (replace USERNAME by your Marburg University staff account name).
  • You will be directed to a "log-in" node of the cluster where you have access to various file systems, including your personal /home directory (see below).
  • From here, you may use the Slurm management system to create compute jobs which are queued and executed by a pool of "worker" nodes.

Usage of MaRC3

You can use the MaRC3 resources in different ways depending on the needs of your project and your experience.

  • Scheduling batch jobs / job arrays is a good choice, if you like to take advantage of specialized hardware and parallel computation (e.g. run the same code on many datasets at the same time). This is typically the most powerful and efficient way of using the cluster but requires some extra knowledge.
  • Using the Jupyter_HPC might be a good choice, if you like to interact with your code and your project does not need very intensive parallel compute. This might be the case in data exploration or for development purposes.
  • An interactive job is a valid choice, if you need to interact with your code at runtime but JupyterLab is not your preferred environment. There are also options to enable graphical interfaces. Ideally, this should be avoided though, as graphical sessions are generally quite slow and usually not very efficient. A notable exception is RStudio IDE (see MaRC3 documentation).

In the background, there is a management and scheduling system (Slurm), which files the concurrent jobs of all users to queues and prioritizes them to ensure that resources are used efficiently and everyone has a fair chance to use the cluster. At the same time, this means that a job might not be executed immediately. The waiting time depends on multiple factors, including the cluster's current load, the requested resources for a job (job duration, CPU cores, RAM, GPUs, ...), and also your past jobs (accounting).

No compute at login nodes!

Please do NOT use the log-in nodes for compute jobs.
Their resources are needed otherwise.

As a TAM member, you have access to a specialized so-called "owner-partition" owner_tam which gives you privileged access to the resources which were sponsored by TAM. For details see MaRC3 documentation.

MaRC hardware

The below figure gives a brief overview of the hardware currently available in the cluster. Since new hardware is almost constantly being added, the below information is prone to become outdated.


Figure: Overview of MaRC3a hardware (updated: January 2025)

Storage at MaRC3 and MaSC

Once you log in (via SSH or via Jupyter_HPC), you have access to multiple file systems that serve different purposes and thus differ largely in terms of capacity, access speed and persistence. Knowing which file system to use for what is crucial for getting good job performance and avoiding loss of data!

MaRC3 storage. Figure: MaRC3 file systems (figure by Marcus Lechner; updated by René Sitt).

The general principle is as follows:

  • Large datasets are stored at high capacity locations at MaSC which come with the disadvantage of comparatively slow access rates (TAM GitLab or /masc_shared).
  • Prior to the actual computation, data is cloned / copied to a smaller / less persistent but highly accessible (very fast) storage location at MaRC3 like /scratch or /scratch_shared. This prevents slowing down your computation by limited data access rates and thus saves your time and the time of other persons who would also like to use the compute cluster.

The compute cluster and the storage cluster, MaRC3 and MaSC are located in close proximity, thus allowing optimal data transport. Because the storages at MaSC and MaRC3 are shared resources, we kindly ask you to respect fair usage.

The Filesystem /masc_shared is directly mounted at MaRC3 and contains dedicated storage areas for groups and research projects. Persons affiliated with TAM get access to the /masc_shared/in_tam directory. Because this has serious drawbacks in terms of data management, /masc_shared should only be used as a "last resort", if using our git-based common workflow is not feasible for your project (see our workflow section).

Avoid storing data directly at /masc_shared

Storing data on /masc_sared is a deprecated, NOT recommended way to use the DataHub which undermines version control and has general data management issues (including no regular backups)!!

It is highly recommended to rather have data "permanently" stored in a central remote repository at TAM GitLab and make it available for computation on a fast "local" storage on demand.

Environments and software

There is a system of centrally managed environment modules which provide basic standard configurations for certain use cases or enable you to use specific software. The entry point is the module command.

Table: Module commands (from MaRC3 documentation)

Command Effect Notes
module avail list loadable modules does not display dependency subtrees for modules that are not currently loaded
module list list currently loaded modules
module load modulename add modulename to environment synonymous to 'module add modulename'
module unload modulename remove modulename from environment synonymous to 'module del modulename'
module spider list ALL modules
module spider modulename list all versions of modulename modulename can also be a search term (i.e. 'module spider gnu' will find 'gnu7' and 'gnu9' modules)
module purge unload all currently loaded modules useful in job scripts to start with a defined (empty) environment before loading needed modules
module help display help texts 'module' has lots of additional commands and convenience functions that are not listed here

An important module is the miniconda module which is an open-source, cross platform and multi-language environment and package management system. Using the miniconda module (module load miniconda), you can create and enter user-defined (private) environments and install software within them. It also allows you to have specific software versions in parallel (e.g. different python versions). Please refer to the (Ana)conda section of the MaRC3 documentation for details.

If you need software that requires higher privileges or licenses, you can contact the MaRC3 team (see below).

Handling sensitive data

The basic conditions for handling sensitive data in the MaRC3 have been defined in the MaRC3 operating concept (February 2025). Key aspects are summarized below. Deviations from these regulations require a separate agreement with the operators and the legal department in advance and possibly further technical and organizational measures.

Responsibility of the operators

  • Restrict access: Only persons with a valid Marburg University staff account who are authorized by the head of their research group (PI) and the HRZ may use the MaRC3.
  • Separation of networks: Access is only possible via VPN / university network. The worker nodes are separated from the university network. User access is only possible via the login nodes.
  • Offer tools, procedures and training: The operators of the MaRC3 offer a training course "Handling personal data in the HPC environment". In this course, tools and procedures for encryption are described.

Responsibility of the users

  • Participation in trainings: Prior to handling sensitive data on the MaRC3, users must successfully complete the above-mentioned training. The training must be repeated every two years.
  • Avoid handling sensitive data: The handling of personal data on MaRC3 should be reduced to a necessary minimum. If possible, data should be anonymized.
  • Pseudonymization: If anonymization is not possible, the data records must be pseudonymized. The keys for relating the pseudonym to the original data must be stored outside MaRC3.
  • Restrict access: Users must ensure confidentiality and integrity of the data. Access rights to files and directories with sensitive data should be assigned as restrictively as possible so that only those persons are granted access who absolutely need them to perform their tasks.

For sensitive data with high protection requirements, these additional measures are required:

  • Encryption in shared directories: Data with high protection requirements need to be encrypted if stored on shared storage areas (group shares like /masc_share). Software and procedures are provided by the operators.
  • Usage of dedicated compute nodes: Data with high protection requirements may only be processed on dedicated compute nodes. This means that you must reserve an entire node for the processing of highly sensitive data. Otherwise there is a risk of other users accessing temporarily decrypted data.
  • Principle of data minimization: Personal data must be reduced to the necessary minimum. Such data may only be stored on MaRC3 for the purpose of processing and analysis and must be deleted at the earliest possible time.

FAQ, Troubleshooting and Support

  • See FAQ of this manual.
  • Support: Questions, problems, installation requests and suggestions can be sent to the DataHub or the MaRC3 Team.
    Please have the DataHub team in CC, if you are mailing directly to the MaRC3 Team.