Semi-supervised Learning

In semi-supervised learning, a small portion of training data is labeled(or categorized) while the rest of the data points are not labeled. The labeled data is used to train the model, and the model then uses this knowledge to analyze and categorize the unlabeled data.
ℹ️ it’s considered a hybrid of Supervised and Unsupervised Machine Learning.

💡 Semi-supervised learning is often used because labeled data is difficult or expensive to acquire, while unlabeled data is easily available.

Applications: speech recognition, natural language processing, and image classification

types of semi-supervised learning:

  1. Self-Training:
    In this type of learning, the model trains itself on the labeled data set, and then applies this knowledge to the unlabeled data set to generate predicted labels. The model then uses both these predicted labels and the original labeled data to retrain itself, and it repeats this process until the model's performance reaches the desired level. Self-training is used when labeled data is scarce, but unlabeled data is abundant.
  2. Co-Training:
    In co-training, two models are trained to work together on the same dataset. Each model is initially trained on a different, disjointed subset of data, where the features of the model are independent from each other. Then, both models analyze the unlabeled data set and generate predicted labels. The model with high confidence in its predictions, or low entropy, is used to add new labeled data to the labeled dataset for the other model to learn from. Co-learning is used when the labeled data is scarce or when relevant unlabeled data is available.
    ℹ️ Both Self-Training and Co-Training categorized under weakly supervised learning and feature both supervised and unsupervised learning techniques.

Approaches to Semi-supervised Learning:

  • Standard Learning: Traditionally Semi-supervised learning allows directly learning from a relatively small number of labeled examples together with additional unlabeled examples. It aims to allow better classification by utilizing combined labeled and unlabeled data, instead of labeled data alone.
    • Transductive Learning: When instead of all unlabeled data, only specific unlabeled data should be predicted.
  • Active Learning