Hyperparameter Tuning

Understanding Hyperparameter Optimization with Numpy

Hyperparameter Tuning

Understanding Hyperparameter Optimization with Numpy

Hyperparameter tuning finds the best hyperparameter values for the model by searching over a set of values that is specified for each of the hyperparameters.

Outline of this post:

  • Parameter and Hyperparameter
  • Validation data
  • Search Methods
  • Scale
  • Hyperparemeter Optimization with Code

Parameter and Hyperparameter

  • Parameter is a configuration variable(e.g., weight, bias) that is internal to the model and whose value can be estimated from data.
  • Hyperparameter is a configuration (e.g., learning rate, weight decay) that is external to the model and whose value cannot be estimated from data.

Validation data

An optimization procedure is used to discover a set of hyperparameters that perform well or best on the dataset. There are three types of data;

  • Train data: train parameters
  • Validation data : validate hyperparameters
  • Test data: test a model for general use

Cross-validation is a technique to evaluate predictive models by dividing the original sample into a training set to train the model, and a test set to evaluate it. The k-fold cross-validation procedure is an effective approach for estimating the performance of a model.

Scale

Linear scale

Hyperparameter tuning searches the values in the hyperparameter range by using a linear scale. Typically, this is useful when the range of all values from the lowest to the highest is relatively small (within one order of magnitude). This is because uniformly searching values from the range will give you a reasonable exploration of the entire range.

Logarithmic scale (log scale)

Hyperparameter tuning searches the values in the hyperparameter range by using a logarithmic scale. Logarithmic scaling works only for ranges that have only values greater than 0. Logarithmic scaling is useful when searching a range that spans several orders of magnitude.

For example, suppose range is from -4 to 0. You can pick a number x at a regular probability and apply 10^x.

Hyperparameter Search Methods

Define a search space as a grid of hyperparameter values and evaluate every position in the grid. This method is suited best when you already know the suitable hyperparameter space for your case.

Grid search at random

Define a search space as a bounded domain of hyperparameter values and randomly sample points in that domain. Random search is great for discovery and getting hyperparameter combinations that you would not have guessed intuitively.

Coarse to fine

This method is sampling with Grid Search coarsely the hyperparameter space at first, and concentrate samples where validation score is higher.

Hyperparemeter Optimization with Code

Import

You can find modules from this github

# coding: utf-8 
import sys, os 
sys.path.append(os.pardir)   
import numpy as np 
import matplotlib.pyplot as plt 
from dataset.mnist import load_mnist 
from common.multi_layer_net import MultiLayerNet 
from common.util import shuffle_dataset 
from common.trainer import Trainer

Data Preparation

(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True)
# Downsize training data  
x_train = x_train[:500] 
t_train = t_train[:500]
# Split 20% of data for validation  
validation_rate = 0.20 
validation_num = int(x_train.shape[0] * validation_rate) 
x_train, t_train = shuffle_dataset(x_train, t_train)
x_val = x_train[:validation_num] 
t_val = t_train[:validation_num] 
x_train = x_train[validation_num:] 
t_train = t_train[validation_num:]

Trainer

def __train(lr, weight_decay, epocs=50): 
    network = MultiLayerNet(input_size=784, hidden_size_list=[100, 100, 100, 100, 100, 100], 
                            output_size=10, weight_decay_lambda=weight_decay) 
    trainer = Trainer(network, x_train, t_train, x_val, t_val, 
                      epochs=epocs, mini_batch_size=100, 
                      optimizer='sgd', optimizer_param={'lr': lr}, verbose=False) 
    trainer.train()
return trainer.test_acc_list, trainer.train_acc_list

Hyperparamter Search and Training

np.random.uniform returns a value uniformly distributed between low and high. We will search 2 types of hyperparamter.

Weight decay

Weight decay is one form of regularization and it plays an important role in training so its value needs to be set properly.

Weight decay is defined as multiplying each weight in the gradient descent at each epoch by a factor λ [0<λ<1].

Learning rate (lr)

Learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

# Grid search at random ====================================== 
optimization_trial = 100 
results_val = {} 
results_train = {}
for _ in range(optimization_trial): 
    # Set the range of Hyperparamter Search =============== 
    weight_decay = 10 ** np.random.uniform(-8, -4) 
    lr = 10 ** np.random.uniform(-6, -2) 
    # ================================================
val_acc_list, train_acc_list = __train(lr, weight_decay) 
    print("val acc:" + str(val_acc_list[-1]) + " | lr:" + str(lr) + ", weight decay:" + str(weight_decay)) 
    key = "lr:" + str(lr) + ", weight decay:" + str(weight_decay) 
    results_val[key] = val_acc_list 
    results_train[key] = train_acc_list

The best val acc is 0.82 with 0.0099 of lr and 2.7352e-08 of weight decay.

Graph

This is for drawing graphs of the best 20 results.

print("=========== Hyper-Parameter Optimization Result ===========") 
graph_draw_num = 20 
col_num = 5 
row_num = int(np.ceil(graph_draw_num / col_num)) 
i = 0
plt.figure(figsize=(10,10))
for key, val_acc_list in sorted(results_val.items(), key=lambda x:x[1][-1], reverse=True): 
    print("Best-" + str(i+1) + "(val acc:" + str(val_acc_list[-1]) + ") | " + key)
plt.subplot(row_num, col_num, i+1) 
    plt.title("Best-" + str(i+1)) 
    plt.ylim(0.0, 1.0) 
    if i % 5: plt.yticks([]) 
    plt.xticks([]) 
    x = np.arange(len(val_acc_list)) 
    plt.plot(x, val_acc_list) 
    plt.plot(x, results_train[key], "--") 
    i += 1
if i >= graph_draw_num: 
        break
plt.show()

Thank you for reading.