AlphaFold
Introduction
AlphaFold is a groundbreaking AI system that is accelerating research in the field of bioinformatics. To use AlphaFold, the system first takes in a sequence of amino acids and then predicts the three-dimensional structure of a protein with extreme efficiency.
Read more on the AlphaFold official website.
This section on AlphaFold will guide you through using AlphaFold on Elja.
Getting started
Due to NVIDIA compatibility issues, Elja now requires you to run AlphaFold in a Conda environment.
Setting up the Conda environment
We start by initializing the conda environment, following the same steps as described in Conda:
$ module use /hpcapps/lib-mimir/modules/all
$ module load Anaconda3/2022.05
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda config --set auto_activate_base false
$ conda init
$ bash # You can also log out and log in again.
Load AlphaFold
Once conda is initialized and ready to use, we can load the AlphaFold module.
$ ml use /hpcapps/libbio-gpu/modules/all
$ ml load AlphaFold/2.3.1
Run AlphaFold on Elja
AlphaFold will only run efficiently on GPU compute nodes. Be sure to specify a GPU partition when running your jobs.
To run AlphaFold on Elja, you can either use an interactive session or submit a batch job.
Starting an interactive session
You can start an interactive session with the srun command on a GPU node. You can use the screen command or tmux to create a secondary terminal where your interactive session runs in the background.
$ srun --job-name "AlphaFold" --partition gpu-1xA100 --time 01:00:00 --pty bash
$ conda activate $env_path
$ run_alphafold.sh -d /AlphaFoldData/AlphaFold/data -o /hpcapps/source/alphafold_non_docker/dummy_test/ -f /hpcapps/source/alphafold_non_docker/example/query.fasta -t 2020-05-14
Running AlphaFold with SBATCH
cat submit.slurm
#!/bin/bash
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<MAIL> # for example uname@hi.is
#SBATCH --nodes=1 # number of nodes
#SBATCH --partition=gpu-1xA100
#SBATCH --time=1-00:00:00 # run for 1 day maximum
#SBATCH --output=slurm_job_output.log
#SBATCH --error=slurm_job_errors.log # Logs if job crashes
module use /hpcapps/libbio-gpu/modules/all
module load AlphaFold/2.3.1
conda activate $env_path
# Run the command
run_alphafold.sh -d /AlphaFoldData/AlphaFold/data -o /hpcapps/source/alphafold_non_docker/dummy_test/ -f /hpcapps/source/alphafold_non_docker/example/query.fasta -t 2020-05-14
Additional Information
AlphaFold Parameters
When running AlphaFold using the run_alphafold.sh script, several parameters are available:
-d /AlphaFoldData/AlphaFold/data: Specifies the location of the AlphaFold database (required)-o <output_dir>: Directory where results will be saved-f <fasta_file>: Path to the FASTA file containing the protein sequence-t <max_template_date>: Maximum template release date (YYYY-MM-DD)
For a complete list of parameters, refer to the AlphaFold documentation.
Interpreting Results
AlphaFold generates several files for each prediction:
- PDB files containing the predicted structures
- JSON files with confidence metrics
- Visualization files for examining the quality of predictions
The primary metric for evaluating prediction quality is the pLDDT score (predicted Local Distance Difference Test), which ranges from 0 to 100, with higher values indicating higher confidence.
Troubleshooting
Common issues when running AlphaFold on Elja:
- CUDA errors: Ensure you're using the correct GPU partition
- Memory limitations: Large proteins may require more GPU memory; adjust batch sizes if needed
- Environment errors: Verify that the Conda environment is properly activated
For additional help, contact the Elja support team.