# Automatic Dataset Balancing For Classification Tasks

The dataset used for fine-tuning a model plays a vital role in achieving accurate classification results. In scenarios where the dataset is imbalanced, meaning that the number of samples in each class varies significantly, the fine-tuned model may become biased towards the majority class, leading to suboptimal performance for the minority classes.

{% hint style="info" %}
Texti.ai incorporates an automatic dataset balancing mechanism during the fine-tuning process for classification tasks, which helps address this problem.
{% endhint %}

To address this problem, Texti.ai automatically balances your dataset when you fine-tune it for classification tasks. By doing so, the model can be trained on a representative set of data that contains an equal proportion of samples from each class, enhancing its ability to classify instances accurately across all classes.

Let's take an example utilizing this [dataset](https://docs.google.com/spreadsheets/d/1_0Wc1a9pZRJUTVRlcaei86xXiAHvSHwRycIHwPC8a9o/). The dataset consists of 8,001 pairs of prompts and completions, with each prompt assigned to one of three classes: Positive, Negative, or Neutral.

The original distribution of instances is as follows:

* Negative: 2674 instances
* Positive: 2727 instances
* Neutral: 2600 instances

In this scenario, the "Neutral" class has the lowest number of instances, specifically 2600 samples. Consequently, for the fine-tuning process, the final dataset (training & validation) will be adjusted to include a total of 7800 instances, with an equal distribution of 2600 samples from each class.
