Model Evaluation

Once the fine-tuning is complete, you can use the fine-tuned model to generate text for your specific use case. The inference page allows you to input the prompt text and get the model's generated completion text as output.

Evaluation Metrics

During fine-tuning, the model's performance is evaluated using different evaluation metrics. Some of the commonly used evaluation metrics are:

  • Training Loss: The training loss is the value that the model is trying to minimize during training. It measures how well the model is able to fit the training data. A low training loss can indicate that the model is learning the patterns in the training data well, but it can also indicate overfitting if the validation loss is high.

  • Validation Loss: The validation loss is the value that measures how well the model is generalizing to new data that it has not seen before. A low validation loss indicates that the model is able to generalize well to new data. If the validation loss is much higher than the training loss, it indicates that the model may be overfitting to the training data.

In addition to training loss and validation loss, there are several other evaluation metrics that can be used to evaluate the performance of a fine-tuned model. For text classification tasks, the following metrics are commonly used:

  • Precision: The fraction of true positive predictions out of all positive predictions. In other words, precision measures how many of the predicted positive results are actually positive.

  • Recall: The fraction of true positive predictions out of all actual positive examples. In other words, recall measures how many of the actual positive results were correctly predicted.

  • F1 score: A harmonic mean of precision and recall. It's a balanced measure that combines the two metrics to give an overall evaluation of the model's performance.

  • Accuracy: The fraction of correct predictions out of all predictions. In other words, accuracy measures how many of the predictions made by the model are correct.

Last updated