qMTS: Fixed-point quantization for multiple-timescale spiking neural networks

Spiking Neural Networks (SNNs) represent a promising solution for streaming applications at the edge having strict performance requirements. However, implementing SNNs efficiently at the edge requires model quantization to reduce memory and compute requirements. In this paper, we provide methods to quantize a prominent neuron model for temporally rich problems, the parameterized Adaptive Leaky-Integrate-and-Fire (p-ALIF). p-ALIF neurons combine the computational simplicity of Integrate-and-Fire neurons, with accurate learning at multiple timescales, activation sparsity, and increased dynamic range, due to adaptation and heterogeneity. p-ALIF neurons have shown state-of-the-art (SoTA) performance on temporal tasks such as speech recognition and health monitoring. Our method separates SNN quantization into two stages, allowing one to explore different quantization levels efficiently. We demonstrate our quantization method on several temporal benchmarks, demonstrating up to 40x memory reduction and 4x less synaptic operations with little to no accuracy loss.