Building and Annotating Metagenome-Assembled Genomes (MAGs) from Metagenomics Reads

PURL: https://gxy.io/GTN:P00035

Comment: What is a Learning Pathway?

We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.

This learning path will guide you through the process of constructing and analyzing Metagenome-Assembled Genomes (MAGs) using the Galaxy platform. You will explore the key steps involved in transforming raw metagenomic data into high-quality MAGs, from preprocessing to functional annotation.

By the end of this path, you will be able to:

List and describe the essential steps in MAGs construction, including quality control, assembly, binning, and refinement.
Define core concepts such as MAGs, binning, and functional annotation, and understand their significance in metagenomic analysis.
Explain the importance of preprocessing metagenomic reads, focusing on quality control and contamination removal.
Compare the quality of MAGs using metrics like completeness and contamination, and assess their suitability for downstream analysis.
Evaluate the reliability of taxonomic assignments and functional annotations by leveraging reference databases.
Analyze the relative abundance of microbial taxa in samples and infer ecological dynamics.
Identify genomic features annotated by tools like Bakta, including coding sequences (CDS), rRNA, and tRNA.
Interpret functional annotation results to uncover metabolic pathways, virulence factors, and other biological roles within microbial communities.

This path is designed to equip you with both the theoretical knowledge and practical skills needed to confidently construct, evaluate, and analyze MAGs in your research.

Module 0: Introduction to Galaxy – Navigating the Platform and Performing Your First Analysis

Before diving into metagenomics, it’s essential to become comfortable with the tools you’ll be using. This module is designed to introduce you to the Galaxy platform—a user-friendly, web-based environment for bioinformatics analysis.

Through a combination of video tutorials and hands-on exercises, you will:

Familiarize yourself with the Galaxy interface, including its key features and navigation.
Learn how to import data, organize your workspace, and use basic tools.
Complete a guided, simple analysis to gain confidence in running workflows and interpreting results.

By the end of this module, you’ll be ready to tackle more advanced analyses in subsequent modules.

Time estimation: 1 hour 40 minutes

Learning Objectives

Learn how to upload a file
Learn how to use a tool
Learn how to view results
Learn how to view histories
Learn how to extract and run a workflow
Learn how to share a history
Familiarize yourself with the basics of Galaxy
Learn how to obtain data from external sources
Learn how to run tools
Learn how histories work
Learn how to create a workflow
Learn how to share your work

Lesson	Slides	Hands-on	Recordings
A short introduction to Galaxy español plain text Toggle Dropdown Curated translations Español Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages Plain text slides Toggle Dropdown Video Languages tutorial Toggle Dropdown Curated translations Español Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (March 2025) - 27m video View All	plain text Toggle Dropdown Curated translations Español Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages Plain text slides Toggle Dropdown Video Languages	tutorial Toggle Dropdown Curated translations Español Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (March 2025) - 27m video View All
Galaxy Basics for genomics tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (September 2024) - 18m video Tutorial (July 2021) - 13m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (September 2024) - 18m video Tutorial (July 2021) - 13m video View All

Module 1: Quality Control – Ensuring High-Quality Metagenomic Data

High-quality data is the foundation of reliable metagenomic analysis. Poor-quality reads—whether due to low base-calling accuracy, adapter contamination, or insufficient length—can introduce errors, bias assemblies, and compromise your results.

In this module, you will:

Understand the importance of quality control in metagenomic workflows and its impact on downstream analyses.
Learn how to assess, trim, and filter raw sequencing data to retain only high-quality reads.
Use Galaxy tools to remove contaminants, trim adapters, and filter low-quality sequences, ensuring your data is clean and ready for further analysis.

By the end of this module, you’ll be equipped to confidently prepare your metagenomic data for assembly and other advanced analyses.

Time estimation: 1 hour 30 minutes

Learning Objectives

Assess short reads FASTQ quality using FASTQE 🧬😎 and FastQC
Assess long reads FASTQ quality using Nanoplot and PycoQC
Perform quality correction with Cutadapt (short reads)
Summarise quality metrics MultiQC
Process single-end and paired-end data

Lesson	Slides	Hands-on	Recordings
Quality Control plain text Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages Plain text slides tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Lecture (February 2021) - 40m video Tutorial (September 2024) - 51m video Tutorial (May 2023) - 50m video Tutorial (February 2021) - 1h10m video View All	plain text Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages Plain text slides	tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Lecture (February 2021) - 40m video Tutorial (September 2024) - 51m video Tutorial (May 2023) - 50m video Tutorial (February 2021) - 1h10m video View All

Module 2: Contamination and Host Reads Removal – Purifying Your Metagenomic Dataset

Metagenomic datasets frequently contain non-microbial sequences, such as host DNA (e.g., from honey bees) or external contaminants (e.g., human DNA introduced during sample handling or sequencing). These sequences can distort downstream analyses, leading to misassemblies, incorrect taxonomic assignments, and biased functional interpretations.

In this module, you will:

Recognize the sources and impacts of contamination in metagenomic datasets.
Learn how to identify and filter out host and contaminant sequences using Galaxy tools.
Ensure your dataset is enriched for microbial reads, improving the accuracy of MAG reconstruction and enabling more reliable biological insights.

By the end of this module, you’ll be able to confidently clean your metagenomic data, setting the stage for high-quality MAG assembly and analysis.

Time estimation: 1 hour

Learning Objectives

Identify reads originating from contaminants or host genomes.
Remove those reads to produce high-quality, clean metagenomic data suitable for downstream analyses.

Lesson	Slides	Hands-on	Recordings
Remove contamination and host reads tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages

Module 3: Assembly – Reconstructing and Assessing Contigs from Metagenomic Reads

The foundation of MAG reconstruction lies in assembly—the computational process of piecing together fragmented metagenomic reads into longer genomic sequences called contigs. Think of it as solving a complex jigsaw puzzle: your goal is to identify reads that “fit together” by detecting overlapping sequences.

In this module, you will:

Understand the principles and challenges of metagenomic assembly.
Learn how to use Galaxy tools to assemble high-quality contigs from your preprocessed reads.
Explore strategies to optimize assembly parameters for improved accuracy and completeness.
Assess the quality of your assembly using metrics such as contig length distribution, N50, and coverage, ensuring your contigs are suitable for downstream analysis.

By the end of this module, you’ll be equipped to transform your cleaned metagenomic data into contiguous sequences and evaluate their quality, setting the stage for successful MAG reconstruction.

Time estimation: 2 hours

Learning Objectives

Describe what an assembly is.
Explain the difference between co-assembly and individual assembly.
Explain the difference between reads, contigs and scaffolds.
Explain how tools based on de Bruijn graph work.
Evaluate the quality of the Assembly with QUAST, Bowtie2, and CoverM-Contig.
Construct and apply simple assembly pipelines on short read data.

Lesson	Slides	Hands-on	Recordings
Assembly of metagenomic sequencing data assembly metagenomics microgalaxy tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (May 2023) - 1h video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (May 2023) - 1h video View All

Module 4: Binning – From Contigs to Refined Microbial Genomes

Metagenomic binning is the process of grouping assembled contigs into discrete bins, each representing a potential microbial genome. By analyzing sequence composition, coverage, and similarity, binning allows researchers to reconstruct individual genomes from complex microbial communities.

However, initial bins often contain fragmented, redundant, or contaminated sequences, which can compromise downstream analyses. To address this, bin refinement and de-replication are essential steps to improve the quality, completeness, and non-redundancy of your Metagenome-Assembled Genomes (MAGs).

In this module, you will:

Understand how binning algorithms classify contigs into bins based on genomic signatures.
Use Galaxy tools to perform binning and assign sequences to their likely microbial origins.
Evaluate bin quality using metrics such as completeness, contamination, and strain heterogeneity.
Learn techniques for refining bins, including merging, splitting, and contamination removal.
Explore de-replication to identify and retain only the highest-quality representative MAG from sets of similar genomes.

By the end of this module, you’ll be able to reconstruct, refine, and validate high-quality MAGs, ensuring they are ready for taxonomic and functional analysis.

Time estimation: 2 hours

Learning Objectives

Describe what is metagenomics binning.
Describe common challenges in metagenomics binning.
Perform metagenomic binning using MetaBAT 2 software.
Evaluation of MAG quality and completeness using CheckM software.

Lesson	Slides	Hands-on	Recordings
Binning of metagenomic sequencing data binning metagenomics microgalaxy tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (August 2024) - 25m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (August 2024) - 25m video View All

Module 6: Functional Annotation of MAGs – Applying Genomic Approaches to Metagenome-Assembled Genomes

Functional annotation is a fundamental process in genomic analysis, whether you’re working with microbial isolates or Metagenome-Assembled Genomes (MAGs). By applying the same robust approaches used for isolates, you can identify and characterize genes in MAGs, revealing their roles in metabolic pathways, environmental interactions, and ecological functions.

In this module, you will:

Learn how functional annotation tools (such as Bakta) predict and classify genomic features, including coding sequences (CDS), rRNA, tRNA, and more.
Use Galaxy to annotate your MAGs, uncovering their biological potential and functional capabilities.
Explore antimicrobial resistance (AMR) gene detection as part of functional annotation, identifying genes associated with resistance mechanisms.
Interpret the ecological and functional roles of your MAGs, including genes linked to pathogenicity, nutrient cycling, symbiosis, and AMR.
Assess the reliability of annotations and understand the importance of reference databases in ensuring accurate predictions.

By the end of this module, you’ll be able to analyze MAGs with the same confidence and precision as microbial isolates, gaining deeper insights into their ecological roles and functional potential—including their resistance profiles.

Time estimation: 5 hours

Learning Objectives

Run a series of tool to annotate a draft bacterial genome for different types of genomic components
Evaluate the annotation
Process the outputs to formate them for visualization needs
Visualize a draft bacterial genome and its annotations
Run a series of tool to assess the presence of antimicrobial resistance genes (ARG)
Get information about ARGs
Visualize the ARGs and plasmid genes in their genomic context

Lesson	Slides	Hands-on	Recordings
Bacterial Genome Annotation gmod illumina bacteria microgalaxy jbrowse1 tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (September 2024) - 44m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (September 2024) - 44m video View All
Identification of AMR genes in an assembled bacterial genome gmod illumina amr one-health jbrowse1 microgalaxy tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages video video Tutorial (September 2024) - 26m video View All		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages	video video Tutorial (September 2024) - 26m video View All

Lesson	Slides	Hands-on	Recordings
Building and Annotating Metagenome-Assembled Genomes (MAGs) from Short Metagenomics Paired Reads assembly binning metagenomics microgalaxy tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages		tutorial Toggle Dropdown Automatic translations Deutsch Español 中文 Français 日本語 Português العربية More Languages

Editorial Board

This material is reviewed by our Editorial Board:

Bérénice Batut

Paul Zierep

Funding

These individuals or organisations provided funding support for the development of this resource

ELIXIR Europe

de.NBI

Building and Annotating Metagenome-Assembled Genomes (MAGs) from Metagenomics Reads

Module 0: Introduction to Galaxy – Navigating the Platform and Performing Your First Analysis

Module 1: Quality Control – Ensuring High-Quality Metagenomic Data

Module 2: Contamination and Host Reads Removal – Purifying Your Metagenomic Dataset

Module 3: Assembly – Reconstructing and Assessing Contigs from Metagenomic Reads

Module 4: Binning – From Contigs to Refined Microbial Genomes

Module 6: Functional Annotation of MAGs – Applying Genomic Approaches to Metagenome-Assembled Genomes

Recommended follow-up tutorial

Editorial Board

Funding