<div style="border: 2px solid #8A9AD0; margin: 1em 0.2em; padding: 0.5em;">

# Fine-tuning a LLM for DNA Sequence Classification

by [Raphael Mourad](https://training.galaxyproject.org/hall-of-fame/raphaelmourad/), [B√©r√©nice Batut](https://training.galaxyproject.org/hall-of-fame/bebatut/)

CC-BY licensed content from the [Galaxy Training Network](https://training.galaxyproject.org/)

**Objectives**

- How to classify a DNA sequence depending on if it binds a protein or not (transcription factor)?

**Objectives**

- Load a pre-trained model and modify its architecture to include a classification layer.
- Prepare and preprocess labeled DNA sequences for fine-tuning.
- Define and configure training parameters to optimize the model's performance on the classification task.
- Evaluate the fine-tuned model's accuracy and robustness in distinguishing between different classes of DNA sequences.

**Time Estimation: 3H**
</div>


<p>After preparing, training, and utilizing a language model for DNA sequences, we can now fine-tune a pre-trained Large Language Model (LLM) for specific DNA sequence classification tasks. Here, we will use a pre-trained model from Hugging Face, specifically the <a href="https://huggingface.co/RaphaelMourad/Mistral-DNA-v1-17M-hg38">Mistral-DNA-v1-17M-hg38</a>, and adapt it to classify DNA sequences based on their biological functions. Our objective is to classify sequences according to whether they bind to transcription factors.</p>
<blockquote class="comment" style="border: 2px solid #ffecc1; margin: 1em 0.2em">
<div class="box-title comment-title" id="comment-transcription-factors"><i class="far fa-comment-dots" aria-hidden="true" ></i> Comment: Transcription factors</div>
<p>Transcription factors are proteins that play a crucial role in regulating gene expression by binding to specific DNA sequences, known as enhancers or promoters. These proteins act as molecular switches, turning genes on or off in response to various cellular signals and environmental cues. By binding to DNA, transcription factors either promote or inhibit the recruitment of RNA polymerase, the enzyme responsible for transcribing DNA into RNA, thereby influencing the rate of transcription.</p>
<figure id="figure-1" style="max-width: 90%;"><img src="images/two_dna_sequences.png" alt="Diagram illustrating DNA binding with CTCF. The left panel, outlined in red, shows a DNA sequence 'CCACCAGGGGGCGC' labeled as 'DNA binding CTCF,' with an oval labeled 'CTCF' above it. The right panel, outlined in blue, shows a different DNA sequence 'GTGGCTAGTAGGTAG' labeled as 'DNA not binding CTCF,' indicating that this sequence does not interact with CTCF." width="2486" height="852" loading="lazy" /><a target="_blank" href="images/two_dna_sequences.png" rel="noopener noreferrer"><small>Open image in new tab</small></a><br /><br /><figcaption><span class="figcaption-prefix"><strong>Figure 1</strong>:</span> Two types of DNA sequences. On the left, a DNA sequence that binds the transcription factor CTCF. On the right, a DNA sequence that does not bind CTCF.</figcaption></figure>
<p>Transcription factors are essential for numerous biological processes, including cell differentiation, development, and response to external stimuli. Their ability to recognize and bind specific DNA sequences allows them to orchestrate complex gene expression programs, ensuring that the right genes are expressed at the right time and in the right place within an organism. Understanding the function and regulation of transcription factors is vital for deciphering the molecular mechanisms underlying health and disease, and it opens avenues for developing targeted therapeutic interventions.</p>
</blockquote>
<p>This classification task is crucial for understanding gene regulation, as transcription factors play a vital role in controlling which genes are expressed in a cell. By training a model to predict whether a DNA sequence binds to a transcription factor, we can gain insights into regulatory mechanisms and potentially identify novel binding sites or understand the impact of genetic variations on transcription factor binding.</p>
<p>By fine-tuning the model, we aim to leverage its pre-trained knowledge of DNA sequences to achieve high accuracy in this classification task. This tutorial will guide you through the necessary steps, from data preparation to model evaluation, ensuring you can apply these techniques to your own research or projects.</p>
<p>We will use <a href="https://huggingface.co/RaphaelMourad/Mistral-DNA-v1-1M-hg38"><code class="language-plaintext highlighter-rouge">Mistral-DNA-v1-17M-hg38</code></a>, a mixed model that was pre-trained on the entire Human Genome. It contains approximately 17 million parameters and was trained using the Human Genome assembly GRCh38 on sequences of 10,000 bases (10K):</p>


In [None]:
model_name="RaphaelMourad/Mistral-DNA-v1-17M-hg38"

<blockquote class="comment" style="border: 2px solid #ffecc1; margin: 1em 0.2em">
<div class="box-title comment-title" id="comment-pretraining-a-llm"><i class="far fa-comment-dots" aria-hidden="true" ></i> Comment: Pretraining a LLM</div>
<p>To learn how to pretrain a LLM on DNA, please follow the dedicated <a href="{% link topics/statistics/tutorials/genomic-llm-pretraining/tutorial.md %}">‚ÄúPretraining a Large Language Model (LLM) from Scratch on DNA Sequences‚Äù</a> tutorial</p>
</blockquote>
<blockquote class="agenda" style="border: 2px solid #86D486;display: none; margin: 1em 0.2em">
<div class="box-title agenda-title" id="agenda">Agenda</div>
<p>In this tutorial, we will cover:</p>
<ol id="markdown-toc">
<li><a href="#prepare-resources" id="markdown-toc-prepare-resources">Prepare resources</a>    <ol>
<li><a href="#install-dependencies" id="markdown-toc-install-dependencies">Install dependencies</a></li>
</ol>
</li>
</ol>
</blockquote>
<h1 id="prepare-resources">Prepare resources</h1>
<h2 id="install-dependencies">Install dependencies</h2>
<p>The first step is to install the required dependencies:</p>


In [None]:
!pip install accelerate==1.1.0
!pip install peft==0.13.2
!pip install torch==2.5.0
!pip install transformers -U
!pip install progressbar
!pip install bitsandbytes

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<ol>
<li>What is <code style="color: inherit">accelerate</code>?</li>
<li>What is <code style="color: inherit">peft</code>?</li>
<li>What is <code style="color: inherit">torch</code>?</li>
<li>What is <code style="color: inherit">transformers</code>?</li>
</ol>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution"><button class="gtn-boxify-button solution" type="button" aria-controls="solution" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<ol>
<li>
<p><code style="color: inherit">accelerate</code> is a library by <a href="https://huggingface.co/">Hugging Face</a> ‚Äì a platform that provides tools and resources for building, training, and deploying machine learning models ‚Äì designed to simplify the process of training and deploying machine learning models across different hardware environments. It provides tools to optimize performance on GPUs, TPUs, and other accelerators, making it easier to scale models efficiently.</p>
</li>
<li>
<p>The PEFT (Parameter-Efficient Fine-Tuning) Python library, developed by Hugging Face, is a tool designed to efficiently adapt large pretrained models to various downstream tasks without the need to fine-tune all of the model‚Äôs parameters. By focusing on a small subset of parameters, PEFT significantly reduces computational and storage costs, making it feasible to fine-tune large language models (LLMs) on consumer-grade hardware. The library integrates seamlessly with the Hugging Face ecosystem, including Transformers, Diffusers, and Accelerate, enabling streamlined model training and inference. PEFT supports techniques like LoRA (Low-Rank Adaptation) and prompt tuning, and it can be combined with quantization to further optimize resource usage. Its open-source nature fosters collaboration and accessibility, allowing developers to customize models for specific applications quickly and efficiently.</p>
</li>
<li>
<p><code style="color: inherit">torch</code>, also known as PyTorch, it is an open-source machine learning library developed by Facebook‚Äôs AI Research lab. It provides a flexible platform for building and training neural networks, with a focus on tensor computations and automatic differentiation.</p>
</li>
<li>
<p><code style="color: inherit">transformers</code> is a library by Hugging Face that provides implementations of state-of-the-art transformer models for natural language processing (NLP). It includes pre-trained models and tools for fine-tuning, making it easier to apply transformers to various NLP tasks.</p>
</li>
</ol>
</details>
</blockquote>
<h2 id="import-python-libraries">Import Python libraries</h2>
<p>Let‚Äôs now import them.</p>


In [None]:
import os

import accelerate
import flash_attn
import numpy as np
import pandas as pd
import torch
import transformers
from accelerate import FullyShardedDataParallelPlugin, Accelerator
from pathlib import Path
from peft import (
    LoraConfig,
    get_peft_model,
    get_peft_model_state_dict,
    prepare_model_for_kbit_training,
)
from progressbar import ProgressBar
from random import randrange
from torch.utils.data import TensorDataset, DataLoader
from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfig
from transformers import (
    AutoTokenizer,
    AutoModel,
    BitsAndBytesConfig,
    EarlyStoppingCallback,
    set_seed,
)

<blockquote class="comment" style="border: 2px solid #ffecc1; margin: 1em 0.2em">
<div class="box-title comment-title" id="comment-versions"><i class="far fa-comment-dots" aria-hidden="true" ></i> Comment: Versions</div>
<p>This tutorial has been tested with following versions:</p>
<ul>
<li><code style="color: inherit">numpy</code> = 1.19 (and not 1.2)</li>
<li><code style="color: inherit">transformers</code> &gt; 4.47.1</li>
</ul>
<p>You can check the versions with:</p>
<div class="language-plaintext highlighter-rouge"><div><pre style="color: inherit; background: transparent"><code style="color: inherit">np.__version__
transformers.__version__
</code></pre></div>  </div>
</blockquote>
<h1 id="configure-fine-tuning">Configure fine-tuning</h1>
<h2 id="check-and-configure-available-resources">Check and configure available resources</h2>
<p>We select the appropriate device (CUDA-enabled GPU if available) for running PyTorch operations</p>


In [None]:
torch.device('cuda' if torch.cuda.is_available() else 'cpu')

<p>Let‚Äôs check the GPU usage and RAM:</p>


In [None]:
!nvidia-smi

<p>We now set an environment variable that configures how PyTorch manages CUDA memory allocations</p>


In [None]:
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:32"

<h2 id="specify-settings-for-quantization">Specify settings for quantization</h2>
<p>Quantization is a technique used in machine learning and signal processing to reduce the precision of numerical values, typically to decrease memory usage and computational requirements. This process is particularly useful when working with large models as it allows them to be deployed on hardware with limited resources without significantly sacrificing performance.</p>
<p>Here, we use <code style="color: inherit">BitsAndBytesConfig</code> to configure a 4-bit quantization. Using 4-bit precision reduces the memory footprint of the model, which is particularly useful for very large models that might not fit into GPU memory otherwise:</p>


In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-1"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What do the parameters?</p>
<ol>
<li><code style="color: inherit">load_in_4bit=True</code></li>
<li><code style="color: inherit">bnb_4bit_use_double_quant=True</code></li>
<li><code style="color: inherit">bnb_4bit_compute_dtype=torch.bfloat16</code></li>
</ol>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-1"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-1" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<ol>
<li>
<p><code style="color: inherit">load_in_4bit=True</code>: Specifies that the model should be loaded with 4-bit quantization. Using 4-bit precision reduces the memory footprint of the model, which is particularly useful for very large models that might not fit into GPU memory otherwise.</p>
</li>
<li>
<p><code style="color: inherit">bnb_4bit_use_double_quant=True</code>: enables double quantization, which means that the quantization constants from the first quantization are quantized again. This further reduces the memory footprint, although it may introduce additional computational overhead.</p>
</li>
<li>
<p><code style="color: inherit">bnb_4bit_compute_dtype=torch.bfloat16</code>: sets the compute data type to bfloat16 (Brain Floating Point 16-bit format). Using bfloat16 can provide a good balance between computational efficiency and numerical stability, especially on hardware that supports this format, such as certain GPUs and TPUs.</p>
</li>
</ol>
</details>
</blockquote>
<h2 id="configure-accelerate">Configure Accelerate</h2>
<p>Now, we will configure the <a href="https://huggingface.co/docs/accelerate/en/index">Hugging Face Accelerate library</a> to optimize the training process for large models using Fully Sharded Data Parallel (FSDP). This setup is crucial for efficiently utilizing GPU resources and enabling distributed training across multiple devices.</p>
<p>First, we need to configure the FSDP plugin, which will manage how model parameters and optimizer states are sharded across GPUs. This configuration helps in reducing memory usage and allows for the training of larger models.</p>


In [None]:
fsdp_plugin = FullyShardedDataParallelPlugin(
    state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False),
    optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False),
)

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-2"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What do the parameters?</p>
<ol>
<li><code style="color: inherit">state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False)</code>?</li>
<li><code style="color: inherit">optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False)</code>?</li>
</ol>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-2"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-2" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<ol>
<li><code style="color: inherit">state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False)</code>
<ul>
<li><code style="color: inherit">FullStateDictConfig</code>: Configures how the model‚Äôs state dictionary (parameters) is managed.</li>
<li><code style="color: inherit">offload_to_cpu=True</code>: Specifies that the model‚Äôs parameters should be offloaded to CPU memory when not in use. This helps free up GPU memory, especially useful when working with large models.</li>
<li><code style="color: inherit">rank0_only=False</code>: Indicates that the state dictionary operations (like saving and loading) are not restricted to the rank 0 process. This allows all processes to participate in these operations, which can be beneficial for distributed training setups.</li>
</ul>
</li>
<li><code style="color: inherit">optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False)</code>
<ul>
<li><code style="color: inherit">FullOptimStateDictConfig</code>: Configures how the optimizer‚Äôs state dictionary is managed.</li>
<li><code style="color: inherit">offload_to_cpu=True</code>: Similar to the model‚Äôs state dictionary, this setting offloads the optimizer states to CPU memory when not in use, further reducing GPU memory usage.</li>
<li><code style="color: inherit">rank0_only=False</code>: Allows all processes to handle the optimizer state dictionary operations, ensuring that the optimizer states are managed efficiently across the distributed setup.</li>
</ul>
</li>
</ol>
</details>
</blockquote>
<p>Next, we initialize the Accelerator from the Hugging Face Accelerate library, integrating the FSDP plugin for seamless distributed training:</p>


In [None]:
accelerator = Accelerator(fsdp_plugin=fsdp_plugin)

<p>By passing the FSDP plugin to the <code style="color: inherit">Accelerator</code>, we enable sharded data parallelism, which efficiently manages model and optimizer states across multiple GPUs.</p>
<p>With this configuration, the <code style="color: inherit">Accelerator</code> will handle the complexities of distributed training, allowing us to focus on developing and experimenting with our models. This setup is particularly beneficial when working with large-scale models and limited GPU resources, as it optimizes memory usage and enables faster training times.</p>
<h2 id="configure-lora-for-parameter-efficient-fine-tuning">Configure LoRA for Parameter-Efficient Fine-Tuning</h2>
<p>We will configure the LoRA (Low-Rank Adaptation) settings for parameter-efficient fine-tuning of a large language model. LoRA is a technique that allows us to fine-tune only a small number of additional parameters while keeping the original model weights frozen, making it highly efficient for adapting large models to specific tasks.</p>
<p>We use the <code style="color: inherit">LoraConfig</code> class to define the settings for LoRA. This configuration specifies how the low-rank adaptations are applied to the model.</p>


In [None]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="SEQ_CLS",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj"]
)

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-3"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What do the parameters?</p>
<ol>
<li><code style="color: inherit">r=16</code>?</li>
<li><code style="color: inherit">lora_alpha=16</code>?</li>
<li><code style="color: inherit">lora_dropout=0.05</code>?</li>
<li><code style="color: inherit">bias="none"</code>?</li>
<li><code style="color: inherit">task_type="SEQ_CLS"</code>?</li>
<li><code style="color: inherit">target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj"]</code>?</li>
</ol>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-3"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-3" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<ol>
<li>
<p><code style="color: inherit">r=16</code>: This parameter specifies the rank of the low-rank matrices used in the adaptation. A higher rank allows the model to capture more complex patterns but also increases the number of trainable parameters.</p>
</li>
<li>
<p><code style="color: inherit">lora_alpha=16</code>: This scaling factor controls the magnitude of the updates applied by the low-rank matrices. It helps balance the influence of the adaptations relative to the original model weights.</p>
</li>
<li>
<p><code style="color: inherit">lora_dropout=0.05</code>: Dropout is applied to the low-rank matrices during training to prevent overfitting. A dropout rate of 0.05 means that 5% of the elements are randomly set to zero during each training step.</p>
</li>
<li>
<p><code style="color: inherit">bias="none"</code>: This setting specifies that no bias parameters are added to the low-rank adaptations. Other options include ‚Äúall‚Äù to add biases to all layers or ‚Äúlora_only‚Äù to add biases only to the LoRA layers.</p>
</li>
<li>
<p><code style="color: inherit">task_type="SEQ_CLS"</code>: This indicates that the model is being fine-tuned for a sequence classification task. Other task types might include ‚ÄúCAUSAL_LM‚Äù for causal language modeling or ‚ÄúSEQ_2_SEQ_LM‚Äù for sequence-to-sequence tasks.</p>
</li>
<li>
<p><code style="color: inherit">target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj"]</code>: This list specifies the modules within the model architecture to which the LoRA adaptations will be applied. These modules are typically the attention layers in transformer models:</p>
<ul>
<li><code style="color: inherit">"q_proj"</code>: query projections</li>
<li><code style="color: inherit">"k_proj"</code>: key projections</li>
<li><code style="color: inherit">"v_proj"</code>: value projections</li>
<li><code style="color: inherit">"o_proj"</code>: output projections</li>
<li><code style="color: inherit">"gate_proj"</code>: gating projections in some architectures.</li>
</ul>
</li>
</ol>
</details>
</blockquote>
<p>By configuring LoRA in this way, we can efficiently adapt a large pretrained model to a specific task with minimal computational overhead, making it feasible to fine-tune on consumer-grade hardware. This approach is particularly useful for tasks like text classification, sentiment analysis, or any other application where we need to specialize a general-purpose language model.</p>
<h2 id="configure-training-arguments">Configure Training Arguments</h2>
<p>Let‚Äôs now set up the training arguments using the <code style="color: inherit">TrainingArguments</code> class from the Hugging Face Transformers library. These arguments define the training configuration, including hyperparameters and settings for saving and evaluating the model.</p>


In [None]:
training_args = transformers.TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=1e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    bf16=True,
    report_to="none",
    load_best_model_at_end = True,
)

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-4"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What do the parameters?</p>
<ol>
<li><code style="color: inherit">output_dir="./results"</code></li>
<li><code style="color: inherit">evaluation_strategy="epoch"</code></li>
<li><code style="color: inherit">save_strategy="epoch"</code></li>
<li><code style="color: inherit">learning_rate=1e-5</code></li>
<li><code style="color: inherit">per_device_train_batch_size=16</code></li>
<li><code style="color: inherit">per_device_eval_batch_size=16</code></li>
<li><code style="color: inherit">num_train_epochs=5</code></li>
<li><code style="color: inherit">weight_decay=0.01</code></li>
<li><code style="color: inherit">bf16=True</code></li>
<li><code style="color: inherit">report_to="none"</code></li>
<li><code style="color: inherit">load_best_model_at_end=True</code></li>
</ol>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-4"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-4" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<ol>
<li>
<p><code style="color: inherit">output_dir="./results"</code>: Specifies the directory where the model predictions and checkpoints will be saved.</p>
</li>
<li>
<p><code style="color: inherit">evaluation_strategy="epoch"</code>: The model will be evaluated at the end of each epoch. This allows for monitoring the model‚Äôs progress and adjusting the training process as needed.</p>
</li>
<li>
<p><code style="color: inherit">save_strategy="epoch"</code>: The model checkpoints will be saved at the end of each epoch.  This ensures that checkpoints are available for each complete pass through the dataset.</p>
</li>
<li>
<p><code style="color: inherit">learning_rate=1e-5</code>: Sets the initial learning rate for the optimizer. This rate determines how much the model‚Äôs weights are updated during training.</p>
</li>
<li>
<p><code style="color: inherit">per_device_train_batch_size=16</code>: The number of samples per device (e.g., GPU) to load for training.</p>
</li>
<li>
<p><code style="color: inherit">per_device_eval_batch_size=16</code>: The number of samples per device to load for evaluation.</p>
</li>
<li>
<p><code style="color: inherit">num_train_epochs=5</code>: The total number of training epochs. An epoch is one complete pass through the training dataset.</p>
</li>
<li>
<p><code style="color: inherit">weight_decay=0.01</code>: Applies L2 regularization to the model weights to prevent overfitting.</p>
</li>
<li>
<p><code style="color: inherit">bf16=True</code>: Enables mixed precision training using bfloat16, which can speed up training and reduce memory usage on compatible hardware.</p>
</li>
<li>
<p><code style="color: inherit">report_to="none"</code>: Disables reporting to external services like WandB or TensorBoard. If you want to track metrics, you can set this to ‚Äúwandb‚Äù, ‚Äútensorboard‚Äù, etc.</p>
</li>
<li>
<p><code style="color: inherit">load_best_model_at_end=True</code>: Ensures that the best model based on evaluation metrics is loaded at the end of training.</p>
</li>
</ol>
</details>
</blockquote>
<p>These settings provide a balanced configuration for training a model efficiently while ensuring that the best version of the model is saved and can be used for further evaluation or deployment. Adjust these parameters based on your specific use case and available computational resources.</p>
<h1 id="prepare-the-tokenizer">Prepare the tokenizer</h1>
<p>We will now set up the tokenizer to convert DNA sequences into numerical tokens that the model can process. The tokenizer is a crucial component in preparing the data for model training and inference: it transforms raw text into a format that can be processed by machine learning models.</p>
<p>We use the <code style="color: inherit">AutoTokenizer</code> class from the Hugging Face Transformers library to load a pre-trained tokenizer. We specify the pre-trained model from which to load the tokenizer. This should match the model you plan to use for training or inference. This tokenizer will be configured to handle DNA sequences efficiently.</p>


In [None]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_name,
    model_max_length=200,
    padding_side="right",
    use_fast=True,
    trust_remote_code=True,
)

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-5"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What do the parameters?</p>
<ol>
<li><code style="color: inherit">model_max_length=200</code></li>
<li><code style="color: inherit">padding_side="right"</code></li>
<li><code style="color: inherit">use_fast=True</code></li>
<li><code style="color: inherit">trust_remote_code=True</code></li>
</ol>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-5"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-5" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<ol>
<li>
<p><code style="color: inherit">model_max_length=200</code>: Sets the maximum length of the tokenized sequences. Sequences longer than this will be truncated, and shorter ones will be padded.</p>
</li>
<li>
<p><code style="color: inherit">padding_side="right"</code>: Specifies that padding should be added to the right side of the sequences. This ensures that all sequences in a batch have the same length.</p>
</li>
<li>
<p><code style="color: inherit">use_fast=True</code>: Enables the use of the fast tokenizer implementation, which is optimized for speed and is suitable for most use cases.</p>
</li>
<li>
<p><code style="color: inherit">trust_remote_code=True</code>: Allows the tokenizer to execute custom code from the model repository, which may be necessary for some models that require specific preprocessing steps.</p>
</li>
</ol>
</details>
</blockquote>
<p>By configuring the tokenizer in this way, we ensure that our DNA sequences are properly tokenized and formatted for input into the model. This step is essential for preparing our data for efficient and effective model training and evaluation.</p>
<p>Let‚Äôs now tailor the tokenizer to better suit our specific use case, ensuring that the model processes sequences accurately and efficiently. Special tokens play a crucial role in defining how sequences are processed and interpreted by the model. Here, we sets:</p>
<ul>
<li>the end-of-sequence (EOS) token, which indicates the end of a sequence. It is essential for tasks where the model needs to generate sequences or understand where a sequence ends.</li>
<li>the padding (PAD) token, which is used to pad sequences to a uniform length within a batch. Padding ensures that all sequences in a batch have the same length, which is necessary for efficient processing during training and inference</li>
</ul>
<div class="language-plaintext highlighter-rouge"><div><pre style="color: inherit; background: transparent"><code style="color: inherit">tokenizer.eos_token = "[EOS]"
tokenizer.pad_token = "[PAD]"
</code></pre></div></div>
<h1 id="prepare-data">Prepare data</h1>
<p>To finetune the model, we must provide a dataset to train the model. We will the data with the 1st transcription factor (<code class="language-plaintext highlighter-rouge">tf0</code>) in mouse from {% cite zhou2024dnabert2efficientfoundationmodel %}. The data is stored on <a href="https://github.com/raphaelmourad/Mistral-DNA">GitHub</a>.</p>
<h2 id="get-data">Get data</h2>
<p>Let‚Äôs get the data for from GitHub:</p>


In [None]:
!git clone https://github.com/raphaelmourad/Mistral-DNA.git

<p>We now need to uncompress the labeled data:</p>


In [None]:
!tar -xf Mistral-DNA/data/GUE.tar.xz -C Mistral-DNA/data/

<p>We change the current working directory to the <code style="color: inherit">Mistral-DNA</code> folder.</p>


In [None]:
os.chdir("Mistral-DNA/")
print(os.getcwd())

<p>Let‚Äôs define experience and path to data variables</p>


In [None]:
expe = "tf/0"
data_path = f"data/GUE/{ expe }"

<h2 id="prepare-datasets-for-training-and-validation">Prepare Datasets for Training and Validation</h2>
<p>We now need to set up the datasets required for training and validating. Properly preparing these datasets is crucial for ensuring that the model finetunes effectively and generalizes well to new data.</p>
<p>We will use the files <code style="color: inherit">data_path</code> folder we just defined:</p>
<ul>
<li><code style="color: inherit">train.csv</code> for training</li>
<li><code style="color: inherit">dev.csv</code> for validation</li>
</ul>
<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-6"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>How is the content of each file?</p>
<ol>
<li><code style="color: inherit">train.csv</code></li>
<li><code style="color: inherit">dev.csv</code></li>
</ol>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-6"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-6" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>The 2 files are CSV files with 2 columns (<code style="color: inherit">sequence</code> and <code style="color: inherit">label</code>) and different number of rows:</p>
<ol>
<li><code style="color: inherit">train.csv</code>: 32,379 rows.</li>
<li><code style="color: inherit">dev.csv</code>: 1,000 rows</li>
</ol>
<p>Values in <code style="color: inherit">label</code> are:</p>
<ul>
<li><code style="color: inherit">0</code>: The DNA sequence in <code style="color: inherit">sequence</code> column does not bind to the 1st transcription factor.</li>
<li><code style="color: inherit">1</code>: The DNA sequence in <code style="color: inherit">sequence</code> column binds to the transcription factor.</li>
</ul>
</details>
</blockquote>
<p>Before we proceed we import some classes and functions from <code style="color: inherit">scriptPython/function.py</code>:</p>


In [None]:
### LOAD FUNCTIONS MODULE
import sys
sys.path.append("scriptPython/")
from functions import *

<p>We use the <code style="color: inherit">SupervisedDataset</code> class to load and prepare the datasets. This class handles the tokenization and formatting of the data, making it ready for model training and evaluation.</p>


In [None]:
train_dataset = SupervisedDataset(
    tokenizer=tokenizer,
    data_path=Path(data_path) / "train.csv",
    kmer=-1,
)
val_dataset = SupervisedDataset(
    tokenizer=tokenizer,
    data_path=Path(data_path) / "dev.csv",
    kmer=-1,
)

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-7"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What does <code style="color: inherit">kmer=-1</code>?</p>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-7"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-7" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>This parameter is used to specify the length of k-mers (substrings of length k) to be considered in the dataset. A value of -1 typically means that no k-mer splitting is applied, and the sequences are processed as they are.</p>
</details>
</blockquote>
<h2 id="configure-data-collation">Configure Data Collation</h2>
<p>A data collator ensures that sequences are properly padded and formatted, which is crucial for optimizing the training process.</p>
<p>We‚Äôll use the <code style="color: inherit">DataCollatorForSupervisedDataset</code> class to handle the collation of tokenized data. This collator will manage padding and ensure that all sequences in a batch are of uniform length.</p>


In [None]:
data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)

<h1 id="load-and-configure-the-model-for-sequence-classification">Load and Configure the Model for Sequence Classification</h1>
<p>Let‚Äôs now load the pre-trained model, a model originally trained for large language modeling tasks, not specifically for classification. To adapt it for our binary classification task, we will add a new classification head on top of the existing architecture. This head will consist of a single neuron that connects to the output of the language model, enabling it to classify whether a DNA sequence binds to a transcription factor (label <code style="color: inherit">1</code>) or not (label <code style="color: inherit">0</code>).</p>
<p>This additional layer, or <strong>classification head</strong>, is a simple neural network layer that takes the high-level features extracted by the language model and maps them to our binary classification output. It learns to weigh these features appropriately to make accurate predictions for our specific task.</p>
<p>We use the <code style="color: inherit">AutoModelForSequenceClassification</code> class from the Hugging Face Transformers library to load the pre-trained model and set it up for our specific classification task:</p>


In [None]:
model=transformers.AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2,
    output_hidden_states=False,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-8"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What do the parameters?</p>
<ol>
<li><code style="color: inherit">num_labels=2</code></li>
<li><code style="color: inherit">output_hidden_states=False</code></li>
<li><code style="color: inherit">quantization_config=bnb_config</code></li>
<li><code style="color: inherit">device_map="auto"</code></li>
<li><code style="color: inherit">trust_remote_code=True</code></li>
</ol>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-8"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-8" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<ol>
<li>
<p><code style="color: inherit">num_labels=2</code>: Sets the number of output labels to 2, corresponding to the binary classification task (binding or not binding to transcription factors).</p>
</li>
<li>
<p><code style="color: inherit">output_hidden_states=False</code>: Indicates that the model should not output hidden states. This is typically set to False unless you need access to the intermediate representations for further analysis.</p>
</li>
<li>
<p><code style="color: inherit">quantization_config=bnb_config</code>: Applies predefined quantization configuration to the model, which helps reduce memory usage and enables efficient training on consumer-grade hardware.</p>
</li>
<li>
<p><code style="color: inherit">device_map="auto"</code>: Automatically determines the best device placement for the model‚Äôs layers, optimizing for available hardware (e.g., GPUs). If it finds a GPU, it will use a GPU. If there‚Äôs no GPU, it will not use the GPU</p>
</li>
<li>
<p><code style="color: inherit">trust_remote_code=True</code>: Allows the model to execute custom code from the model repository, which may be necessary for certain architectures or preprocessing steps.</p>
</li>
</ol>
</details>
</blockquote>
<p>To ensure that the model correctly handles padding tokens, we need to align the padding token configuration between the model and the tokenizer. This step is crucial for maintaining consistency during training and inference, especially when dealing with sequences of varying lengths:</p>


In [None]:
model.config.pad_token_id = tokenizer.pad_token_id

<h1 id="initialize-the-trainer">Initialize the Trainer</h1>
<p>We can now set up the <code style="color: inherit">Trainer</code> to manage the training and evaluation process of our model. The <code style="color: inherit">Trainer</code> class simplifies the training loop, handling many of the complexities involved in training deep learning models.</p>
<p>We first need to attach the LoRA adapter to the model:</p>


In [None]:
model.add_adapter(peft_config, adapter_name="lora_1")

<p>Let‚Äôs now set up the <code style="color: inherit">Trainer</code>:</p>


In [None]:
trainer = transformers.Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    data_collator=data_collator,
    callbacks = [EarlyStoppingCallback(early_stopping_patience=3)]
)

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-9"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What do the <code style="color: inherit">callbacks = [EarlyStoppingCallback(early_stopping_patience=3)]</code> parameter?</p>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-9"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-9" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>It adds an early stopping mechanism to the training process. This mechanism is designed to halt training when the model‚Äôs performance on the validation set stops improving, helping to prevent overfitting and conserve computational resources.</p>
<p>How Early Stopping Works?</p>
<p><strong>Purpose</strong>: The primary goal of early stopping is to capture the model parameters when the loss reaches its minimum value during training. This is crucial because, after a certain point, continued training may lead to overfitting, where the model starts to perform worse on unseen data.</p>
<p><strong>Patience Parameter</strong>: The <code style="color: inherit">early_stopping_patience=3</code> setting specifies that training should continue for three additional epochs after the model‚Äôs performance on the validation set stops improving. This ‚Äúpatience‚Äù period helps mitigate the effects of noise in the training process. Noise can cause temporary fluctuations in the loss, making it seem like the model has reached a local minimum when further training might yield better results.</p>
<p><strong>Process</strong>: During training, the loss is monitored at each epoch. If the loss does not decrease for three consecutive epochs, training is stopped. However, if a better model with a lower loss is found within those three epochs, training continues. This approach ensures that the model has truly reached a robust local minimum, rather than being prematurely halted due to noise.</p>
<p>By incorporating early stopping with a patience of three epochs, you balance the need to find an optimal model with the risk of overfitting, ultimately leading to more efficient and effective training outcomes.</p>
</details>
</blockquote>
<p>Ffor distributed training, where multiple GPUs or nodes are used to accelerate the training process, it is essential to do:</p>


In [None]:
trainer.local_rank=training_args.local_rank

<p>The <code style="color: inherit">local_rank</code> parameter identifies the rank of the current process within its local node, enabling coordinated communication and synchronization between processes. This setup is crucial for managing tasks such as gradient synchronization and data partitioning, ensuring that each process operates on the correct portion of the model or dataset. By assigning the local rank from <code style="color: inherit">training_args</code> to the <code style="color: inherit">Trainer</code>, we facilitate efficient and scalable training, leveraging the full computational power of multi-GPU environments.</p>
<h1 id="start-the-training">Start the training</h1>
<p>Let‚Äôs start the training process for our model using the trainer.train() method:</p>


In [None]:
trainer.train()

<p>After launching <code style="color: inherit">trainer.train()</code>, we can notice that the training process is significantly faster compared to training a model from scratch seen in <a href="{% link topics/statistics/tutorials/genomic-llm-pretraining/tutorial.md %}">‚Äù‚Äù tutorial</a>. This efficiency is due to the use of a pre-trained model, which has already undergone extensive training on large datasets using powerful computational resources. For example, pre-training a model on even a small portion of the human genome can take dozens of hours, but fine-tuning this model on a specific task, such as classifying DNA sequences, is much quicker. Fine-tuning leverages the pre-trained model‚Äôs foundational knowledge, allowing you to adapt it to new tasks with a smaller, labeled dataset. This approach not only saves time but also reduces the need for extensive computational power. By downloading a pre-trained model from platforms like Hugging Face and fine-tuning it on a local machine with a modest GPU, we can achieve high performance with minimal overhead, making advanced modeling techniques accessible for a wide range of applications.</p>
<h1 id="evaluate-model-performance">Evaluate Model Performance</h1>
<p>After successfully training the model, the next essential step is to evaluate its performance on a test dataset. This evaluation process is crucial for understanding how well the model generalizes to new, unseen data and for assessing its readiness for real-world applications.</p>
<blockquote class="comment" style="border: 2px solid #ffecc1; margin: 1em 0.2em">
<div class="box-title comment-title" id="comment"><i class="far fa-comment-dots" aria-hidden="true" ></i> Comment</div>
<p>If finetuning is too long, you can stop the training.</p>
</blockquote>
<p>The test data is stored in <code style="color: inherit">data_path/test.csv</code>, we prepare it as for training and validation data.</p>


In [None]:
test_dataset = SupervisedDataset(
    tokenizer=tokenizer,
    data_path=Path(data_path) / "test.csv",
    kmer=-1,
)

<p>We then use the <code style="color: inherit">trainer.evaluate()</code> method. This methods is designed to assess the model‚Äôs performance on a specified dataset, typically the test dataset, which contains data that the model has not encountered during training.</p>


In [None]:
results = trainer.evaluate(eval_dataset=test_dataset)

<p>The method computes various evaluation metrics, such as accuracy, precision, recall, and F1 score, depending on the task and the configuration specified in <code style="color: inherit">compute_metrics</code>. These metrics provide a comprehensive view of the model‚Äôs performance, highlighting its strengths and weaknesses.</p>
<p>The Trainer uses the <code style="color: inherit">data_collator</code> to ensure that the test data is properly formatted and padded, maintaining consistency with the training process. This consistency is crucial for accurate evaluation.</p>
<p>The evaluation results are stored in the <code style="color: inherit">results</code> variable, which contains the computed metrics. We can analyze these <code style="color: inherit">results</code> to gain insights into the model‚Äôs performance and make informed decisions about further improvements or deployment.</p>
<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-10"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What is stored in <code style="color: inherit">results</code>? How do you interpret this information?</p>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>üëÅ View solution</summary>
<div class="box-title solution-title" id="solution-10"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-10" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> <span>Solution</span><span class="fold-unfold fa fa-minus-square"></span></button></div>
<p><code style="color: inherit">results</code> provides a comprehensive overview of the model‚Äôs performance on the evaluation dataset with:</p>
<ol>
<li>
<p><strong>eval_loss (0.424961)</strong>: This metric represents the loss value calculated on the evaluation dataset. Lower values indicate better model performance.</p>
<p>A loss of 0.425 suggests that the model is reasonably well-fitted to the data, though the specific interpretation depends on the context and the loss function used (e.g., cross-entropy for classification tasks).</p>
</li>
<li>
<p><strong>eval_accuracy (0.804000)</strong>: Accuracy measures the proportion of correctly predicted instances out of the total instances.</p>
<p>An accuracy of 80.4% indicates that the model correctly predicted the class for 80.4% of the samples in the evaluation dataset.</p>
</li>
<li>
<p><strong>eval_f1 (0.800838)</strong>: The F1 score is the harmonic mean of precision and recall, providing a single metric that balances both concerns.</p>
<p>An F1 score of 0.801 suggests a good balance between precision and recall, indicating that the model performs well in both identifying positive cases and minimizing false positives and negatives.</p>
</li>
<li>
<p><strong>eval_matthews_correlation (0.628276)</strong>: The Matthews Correlation Coefficient (MCC) is a measure of the quality of binary classifications, taking into account true and false positives and negatives.</p>
<p>An MCC of 0.628 indicates a moderate to strong correlation between the predicted and actual classes, suggesting the model is performing better than random guessing.</p>
</li>
<li>
<p><strong>eval_precision (0.824614)</strong>: Precision is the ratio of correctly predicted positive observations to the total predicted positives.</p>
<p>A precision of 82.5% means that out of all the instances predicted as positive, 82.5% were actually positive.</p>
</li>
<li>
<p><strong>eval_recall (0.804000)</strong>: Recall (or sensitivity) is the ratio of correctly predicted positive observations to all observations in the actual class.</p>
<p>A recall of 80.4% indicates that the model correctly identified 80.4% of all actual positive cases.</p>
</li>
<li>
<p><strong>eval_runtime (6.548800)</strong>: The total time taken to evaluate the model on the dataset.</p>
<p>A runtime of 6.55 seconds provides insight into the computational efficiency of the evaluation process.</p>
</li>
<li>
<p><strong>eval_samples_per_second (152.699000)</strong>: The number of samples processed per second during evaluation.</p>
<p>Processing 152.7 samples per second indicates the efficiency of the evaluation pipeline.</p>
</li>
<li>
<p><strong>eval_steps_per_second (9.620000)</strong>: The number of evaluation steps completed per second.</p>
<p>Completing 9.62 steps per second reflects the speed of the evaluation process.</p>
</li>
<li>
<p><strong>epoch (3.000000)</strong>: The number of training epochs completed before this evaluation.</p>
</li>
</ol>
<p>The evaluation was conducted after 3 epochs of training, providing context for the model‚Äôs learning progress.</p>
</details>
</blockquote>
<h1 id="conclusion">Conclusion</h1>
<p>In this tutorial, we explored the process of fine-tuning a large language model (LLM) for DNA sequence classification. By following the steps outlined, you have learned how to leverage pre-trained models to achieve efficient and effective classification of DNA sequences, specifically focusing on their binding affinity to transcription factors.</p>
<p>We began by configuring the fine-tuning process, ensuring that available computational resources were optimally utilized. This included specifying settings for quantization, configuring Accelerate for distributed training, and implementing LoRA for parameter-efficient fine-tuning. These steps were crucial for maximizing performance and minimizing computational overhead.</p>
<p>Next, we prepared the tokenizer and data, ensuring that DNA sequences were properly tokenized and formatted for model input. We created datasets for training and validation, and configured data collation to handle batch processing efficiently.</p>
<p>We then loaded and configured the model for sequence classification, adding a classification head to adapt the pre-trained model to our specific task. With the model and data prepared, we initialized the <code style="color: inherit">Trainer</code>, which streamlined the training process by managing the training loop, evaluation, and checkpointing.</p>


# Key Points

- Fine-tuning pre-trained LLMs reduces training time and computational needs, making advanced research accessible.
- Techniques like LoRA enable fine-tuning on modest hardware, broadening access to powerful models.
- Rigorous testing on unseen data confirms a model's practical applicability and reliability.

# Congratulations on successfully completing this tutorial!

Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/statistics/tutorials/genomic-llm-finetuning/tutorial.html#feedback) and check there for further resources!
