Machine Learning – Semi-supervised
Semi-supervised machine learning algorithms are neither fully supervised nor fully unsupervised. They basically fall between the two, i.e., supervised and unsupervised learning methods.
Semi-supervised algorithms generally use small supervised learning component, i.e., small amount of pre-labeled annotated data and large unsupervised learning component, i.e., lots of unlabeled data for training.
We can follow any of the following approaches for implementing semi-supervised learning methods −
-
The first and simple approach is to build the supervised model based on a small labeled and annotated data and then build the unsupervised model by applying the same to the large amounts of unlabeled data to get more labeled samples. Now, train the model on them and repeat the process.
-
The second approach needs some extra efforts. In this approach, we can first use the unsupervised methods to cluster similar data samples, annotate these groups and then use a combination of this information to train the model.
The algorithm is trained on a dataset that contains both labeled and unlabeled data. Semi-supervised learning is generally used when we have a huge set of unlabeled data available. In any supervised learning algorithm, the available data has to be manually labelled which can be quite an expensive process. In contrast, the unlabelled data used in unsupervised learning has limited applications. Hence, unsupervised learning algorithms were developed which can provide a perfect balance between the two.
Semi-Supervised Learning algorithm find its application in text classification, image classification, speech analysis, anomaly detection, etc. where the general goal is to classify an entity into a predefined category. Semi-supervised algorithm assumes that the data can be divided into discrete clusters and the data points closer to each other are more likely to share the same output label.