I. Introduction
The problem of learning an optimal model for a given data set is usually divided into two subtasks: finding optimal values for parameters of a single model and finding the best model among a collection of different models. There is a large number of methods for solving the former problem ranging from ad hoc algorithms designed to mimic the assumed behavior of the human brain to others that minimize various cost functions. The second problem, model selection, is more difficult in general and it is consequently also more difficult to design good heuristics for this task.