# Automatic Dataset Balancing For Classification Tasks

The dataset used for fine-tuning a model plays a vital role in achieving accurate classification results. In scenarios where the dataset is imbalanced, meaning that the number of samples in each class varies significantly, the fine-tuned model may become biased towards the majority class, leading to suboptimal performance for the minority classes.

{% hint style="info" %}
Texti.ai incorporates an automatic dataset balancing mechanism during the fine-tuning process for classification tasks, which helps address this problem.
{% endhint %}

To address this problem, Texti.ai automatically balances your dataset when you fine-tune it for classification tasks. By doing so, the model can be trained on a representative set of data that contains an equal proportion of samples from each class, enhancing its ability to classify instances accurately across all classes.

Let's take an example utilizing this [dataset](https://docs.google.com/spreadsheets/d/1_0Wc1a9pZRJUTVRlcaei86xXiAHvSHwRycIHwPC8a9o/). The dataset consists of 8,001 pairs of prompts and completions, with each prompt assigned to one of three classes: Positive, Negative, or Neutral.

The original distribution of instances is as follows:

* Negative: 2674 instances
* Positive: 2727 instances
* Neutral: 2600 instances

In this scenario, the "Neutral" class has the lowest number of instances, specifically 2600 samples. Consequently, for the fine-tuning process, the final dataset (training & validation) will be adjusted to include a total of 7800 instances, with an equal distribution of 2600 samples from each class.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.texti.ai/automatic-dataset-balancing-for-classification-tasks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
