Error Estimation and Model Selection

Tobias Scheffer

Infix Publisher

126 pages, ISBN 3-89601-225-8

Postscript Version; PDF Version

A central problem of machine learning is to decide whether a hypothesis just
happens to match the available data well, or whether it actually has a
high generalization ability.  Strongly related is the problem of
deciding which of several available learning algorithms or hypothesis
languages leads to the highest generalization performance.  This
is referred to as the model selection problem.

This book centers around an analysis of the error rate of classifiers
that predicts the expected generalization behavior of a learning
algorithm for a given problem.  The analysis results in a model
selection algorithm which can solve large model selection (e.g., feature
subset selection) problems efficiently.

Similar analyses can be applied to quantify the generalization performance
of a holdout testing based model selection algorithm, and to quantify the
optimistic bias of the error estimate which is imposed by running several
learners on the same data set and selecting the one with the lowest holdout
error rate.

Table of Contents