Activation functions play an important role in the convergence of learning algorithms based on neural networks. They provide neural networks with the nonlinear ability and the possibility to fit in any complex data. However, no deep study exists in the literature on the comportment of activation functions in modern architecture. Therefore, in this research, we compare the 18 most used activation functions on multiple datasets (CIFAR-10, CIFAR-100, CALTECH-256) using 4 different models (EfficientNet, ResNet, a variation of ResNet using the bag of tricks, and MobileNet V3). Furthermore, we explore the shape of the loss landscape of those different architectures with various activation functions. Lastly, based on the result of our experimentation,
we introduce a new locally quadratic activation function namely Hytana alongside one variation Parametric Hytana which
outperforms common activation functions and addresses the dying ReLU problem.

Published in: Inteligencia Artificial (December 2022)
ISBN:
DOI: 10.4114/intartif.vol25iss70pp95-109