Hyperparameters

Hyperparameters are the settings that determine how the model learns from the data, such as the learning rate, the batch size, and the number of epochs. These settings can greatly impact the model's performance and accuracy, so it's important to choose them carefully.

Experienced data scientists often experiment with different hyperparameters and evaluate the accuracy of their predictions using metrics like training loss, validation loss, and other evaluation metrics. They use these metrics to determine the performance of the model and find the best possible hyperparameters for their specific task. For example, if a data scientist is working on a text classification task, they might experiment with different values for the learning rate and batch size, and then evaluate the performance of the model using metrics like precision, recall, and F1-score. By repeating this process with different hyperparameters, they can fine-tune the model to achieve the best possible performance on their specific task.

In the context of Texti.ai, users can choose their hyperparameters using the one-click fine-tune option or manually input hyperparameters that they have found to be optimal for their specific task.

  • One-click fine-tune: This option follows the recommended hyperparameters for the pre-trained model.

  • Advanced setting (Manual hyperparameter optimization): Advanced users can manually enter the hyperparameters they want to use for fine-tuning.

    Note that this section is aimed at users who are not familiar with hyperparameters. Advanced users can skip this section and manually enter the hyperparameters they want to use.

    These settings can greatly impact the model's performance and accuracy, so it's important to choose them carefully. Here are the explanations of the hyperparameters:

    1. n_epochs: The number of times the model should see the entire dataset during training. This is also known as the number of training iterations. A higher number of epochs can lead to better performance, but can also increase the risk of overfitting. Recommended values: - 2-5 for quick experimentation - 10-50 for fine-tuning.

    2. Test Set (%): This hyperparameter specifies the percentage of the entire dataset to be reserved for testing. When splitting data for model training and evaluation, this value determines the proportion allocated to the test set. The test set is crucial as it's used to evaluate the performance of the model after fine-tuning, ensuring it generalizes well to unseen data. Typically, a certain percentage (e.g., 20-30%) might be set aside for this purpose, though the exact value can vary based on the dataset and specific use case.

    3. batch_size: The number of examples that are processed in one forward/backward pass during training. A smaller batch size can lead to a slower convergence but can also result in better performance. Recommended values: - 4-16 for quick experimentation - 32-64 for fine-tuning.

    4. learning_rate_multiplier: A multiplier that affects the learning rate of the model during training. A smaller learning rate can lead to slower convergence but can also help prevent the model from overshooting the optimal solution. Recommended values: - 0.01 for quick experimentation - 0.1-1 for fine-tuning.

    5. prompt_loss_weight: A weighting factor that determines how much importance should be given to the prompt text during training. For example, if the prompt_loss_weight is set to 0.8, the model will focus more on generating text that matches the prompt text rather than generating text that is completely novel. This hyperparameter is only valid for text generation tasks. Recommended values: - 0.5-0.9 for quick experimentation - 0.8-1 for fine-tuning.

    6. compute_classification_metrics: A flag that determines whether to compute classification metrics such as precision, recall, and F1 score during training. This hyperparameter is only valid for text classification tasks. Recommended values- True.

    7. classification_n_classes: The number of classes to classify the text into. This hyperparameter is only valid for text classification tasks. Recommended values: - 2 for binary classification - 3 or more for multi-class classification.

    8. classification_positive_class: The name or index of the positive class in the classification task. This hyperparameter is only valid for text classification tasks. Recommended values: - "positive" for binary classification - the index of the positive class for multi-class classification.

    9. classification_betas: The beta value(s) to use for F-beta score computation. This hyperparameter is only valid for text classification tasks. Recommended values: - 0.5 for F1 score - 1 for precision/recall balance.

Last updated