Impact of changes in the Mini-batch size on CNN Training Epoch Time
TimeTuesday, June 23rd3:50pm - 3:55pm
DescriptionConvolutional Neural Networks (CNN) drive successful machine learning applications in a growing number of areas. However, training a CNN may take a massive amount of time and expensive high-end GPU resources. CNN training time can change extremely depending on the GPU type and training parameters. In this work, we focus on one training parameter that has a particularly high impact on training time — mini-batch size — to clarify how and why changes in the mini-batch size affect CNN training epoch time.
To understand how epoch time changes with the mini-batch size, we conducted an experiment that measures epoch time of a sample CNN — VGG16 with CIFAR100 dataset in Chainer. We observed extremely high variability of epoch times on several GPU types. Moreover, on some GPU types, we observed abrupt changes: even a slight variation of the mini-batch size makes epoch time increase or decrease almost twofold.
To understand why the abrupt changes occur, we investigated the underlying cuDNN library.
cuDNN provides several different algorithms for each convolution operation. Chainer uses cuDNN heuristics to choose which algorithm to use.
We simulated convolutional layers with a benchmark tool and looked at how their time changes with the mini-batch size. We have found that cuDNN heuristics may choose convolution algorithms that differ hugely in execution time for different mini-batch sizes.
Understanding how CNN training time changes with the mini-batch size is essential for making CNN training faster. It can also help in designing a performance model for predicting CNN training time.