The ability to speed up the training for deep learning networks used for AI through chunking
At the International Conference on Learning Representations on May 6, IBM Research shared a look around how chunk-based accumulation can speed the training for deep learning networks used for artificial intelligence (AI).
The company first shared the concept and its vast potential at last year’s NeurIPS conference, when it demonstrated the ability to train deep learning models with 8-bit precision while fully preserving model accuracy across all major AI data set categories: image, speech and text. The result? This technique could accelerate training time for deep neural networks by two to four times over today’s 16-bit systems.
In IBM Research’s new paper, titled 'Accumulation Bit-Width Scaling For Ultralow Precision Training of Deep Networks', researchers explain in greater depth exactly how the concept of chunk-based accumulation works to lower the precision of accumulation from 32-bits down to 16-bits. 'Chunking' takes the product and divides it into smaller groups of accumulation and then adds the result of each of these smaller groups together, leading to a significantly more accurate result than that of normal accumulation. This allows researchers to study new networks and improve the overall efficiency of deep learning hardware.
Although this approach was previously considered infeasible to further reduce precision for training, IBM expects this 8-bit training platform to become a widely adopted industry standard in the coming years.
Author: Daniel Gutierrez