PrePrint: Canonical Correlation Analysis for Multi-label Classification: A Least Squares Formulation, Extensions and Analysis
Canonical Correlation Analysis (CCA) is a well-known technique for finding the correlations between two sets of multi-dimensional variables. It projects both sets of variables onto a lower-dimensional space in which they are maximally correlated. It is well-known that CCA can be formulated as a least squares problem in the binary-class case. However, the extension to the more general setting remains unclear. In this paper, we show that under a mild condition CCA in the multi-label case can be formulated as a least squares problem. Based on this equivalence relationship, efficient algorithms for solving least squares problems can be applied to scale CCA to very large data sets. In addition, we propose several CCA extensions including the sparse CCA formulation based on the 1-norm regularization. We further extend the least squares formulation to partial least squares. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of multi-dimensional variables, providing new insights on the effect of regularization on CCA. We have conducted experiments using benchmark data sets. Experiments on multi-label data sets confirm the established equivalence relationships. Results also demonstrate the effectiveness and efficiency of the proposed CCA extensions.
Source:
IEEE transactions
- Share
-
-
-
-
-
-
Send to a friend
-
more...
- | Post a Comment

