Over the past few years, semi-supervised learning has gained considerable interest and success in both theory and practice. Traditional supervised machine learning algorithms can only make use of labeled data, and reasonable performance is often achieved only with a large number of labeled data. However, labeled data is often expensive and time consuming to collect, while unlabeled data is usually cheaper and easier to obtain. The strength of semi-supervised learning lies in its ability to utilize a large quantity of unlabeled data to effectively and efficiently improve learning performance.
Recently, graph-based semi-supervised learning algorithms are being intensively studied, thanks to its convenient local representation, connection with other models like kernel machines. Graph Laplacian is the central quantity of graph-based semi-supervised learning, which plays a role in exploring the underlying manifold geometry of the data. Using graph Laplacian to form the regularization problem and further employing the kernel techniques is a promising approach of semi-supervised learning.
The author first introduce the basic concepts of semi-supervised learning, as well as the utilized tools and theory, such as support vector machines, kernel methods and regularization theory.
The main contributions of this thesis are mainly presented in chapter 5 and chapter 6. In chapter 5, the author first investigate a class of graph-based semi-supervised learning methods by spectral transformation. Then the formulation of semi-supervised spectral kernel learning based on maximum margin criterion with spectral decreasing order constraints is formed, and he also maintain that the maximum margin criterion is a more essential goal of semi-supervised kernel learning than kernel target alignment by theoretical analysis. By equivalently transforming the resulted intractable optimization problem into a quadratically constrained quadratic programming, the problem can be efficiently solved. Moreover, the author also propose a method to automatically tune the involved trade-off parameter. Furthermore, the author seek another way to learn the spectral coefficients from a more essential view. Due to the fact that the spectral order constraints are actually not hard requirements but only for the purpose of ensuring the smoothness of the score function, the author leaves out those constraints by directly including the smoothness regularizer into the maximum margin objective, which coincides with the theory of manifold regularization. Its efficient iterative algorithm is also designed next. Experimental results on real-world data sets have demonstrated that both of his proposed spectral learning methods achieve promising results against other approaches.
Motivated by the requirements of many practical problems, in chapter 6 the author turns to study the problem of semi-supervised learning with structured outputs, which is a more general topic than the standard semi-supervised learning. By extending the definition of smoothness regularizer to multi-class setting, he next explore the multi-class semi-supervised classification. Although the obtained data dependent kernel similar to that of Sindhwani et al., his multi-class model really extend the theory of theirs. Still next, the author further generalize the multi-class manifold regularization problem to the scenario with structured outputs, and the corresponding dual problems are also obtained. From the dual formulations, we can find that the semi-supervised learning task finally can be achieved by the supervised structural prediction with a newly defined "data dependent joint kernel matrix". This data dependent kernel matrix generalizes that of Sindhwani et al. to structural prediction. Moreover, his proposed inductive approach can naturally predict the unseen data points other than the unlabeled data. Some experiments on text categorization with hierarchies are conducted, and the empirical results show his approaches actually utilize the structural and manifold information of the data simultaneously, and finally help us to improve the prediction performance. As a supplement, the author also proposes the concept of joint Laplacian, which shares the similar properties of standard Laplacian matrix.