@article{2009-BregmanSurrogateLearning-PAMI
, author={Richard Nock and Frank Nielsen}
, title={Bregman divergences and surrogates for learning}
, journal={IEEE Transactions on Pattern Matching and Machine Intelligence}
, month={November}
, year={2009}
, volume={31}
, number={11}
, pages={2048-2059}
, doi={10.1109/TPAMI.2008.225}
, abstract={
  Bartlett et al. (2006) recently proved that a ground condition for surrogates,
 classification calibration, ties up their consistent minimization to that of the
 classification risk, and left as an important problem the algorithmic questions
 about their minimization. In this paper, we address this problem for a wide set
 which lies at the intersection of classification calibrated surrogates and those
 of Murata et al. (2004). This set coincides with those satisfying three common assumptions
 about surrogates. Equivalent expressions for the members-sometimes well known-follow
 for convex and concave surrogates, frequently used in the induction of linear separators
 and decision trees. Most notably, they share remarkable algorithmic features: for
 each of these two types of classifiers, we give a minimization algorithm provably
 converging to the minimum of any such surrogate. While seemingly different, we show
 that these algorithms are offshoots of the same "master" algorithm. This provides
 a new and broad unified account of different popular algorithms, including additive
 regression with the squared loss, the logistic loss, and the top-down induction
 performed in CART, C4.5. Moreover, we show that the induction enjoys the most popular
 boosting features, regardless of the surrogate. Experiments are provided on 40 readily
 available domains.
  }
}