A hyperparameter search is the process of finding the best hyperparameters by training models with different values of hyperparameters and evaluating their performance.
Typically, there are two types of hyperparameter search:
-
Brute force - This involves selecting a set of hyperparameters to try and all the hyper-parameters in the set are tried in a brute-forced fashion. This method does not work well if there are a large number of possible hyperparameters.
-
Sampling/Bayesian search - In contrast to the brute force approach, sampling/Bayesian hyperparameter search methods focus on a smaller subset of the possible hyperparameters by making a succession of intelligent “guesses” of where to look for better hyperparameters. This can often involve building a model that predicts some model validation metrics using the hyperparameters as inputs.
What is a Hyperparameter?
All machine learning algorithms work by learning a set of parameters that will lead to the most accurate prediction of the outcome or to optimize some mathematical measure.
So how are hyperparameters different? In machine learning, a hyperparameter is a model parameter that controls models selection and the learning process. The value of a hyperparameter needs to be set before training begins.
Simply put, the hyperparameter directly impacts the parameters that get chosen for the models. Even though the hyperparameter is not a parameter in the model per see, it sits higher in the hierarchy and is external to the model - Hence, the term hyper.
The optimal values of normal (i.e. non-hyper) parameters are derived via training, but hyperparameter values are selected based on expert knowledge or hyperparameter search.