Is AIMET able to quantize non-computer vision models like BERT and Transformers? Where they consist of non-conv layers but only dense layers instead?
Are you using PyTorch models?
If yes, then we need to add some support. E.g. for RNN/LSTM layers, we recently added a custom Quantized implementation. This can be the template for supporting Transformer layers.
Any chance you are interested in contributing and adding this support to AIMET? We would love it.
Hi @akhobare ,
Thanks for your response. I am using TF. BERT and transformers are not sequential models, they take the words embeddings simultaneously and process them mainly using FC layers (No Conv layers / RNN blocks present) . Can AIMET (TF) quantize the inputs and FC layers?
In general, AIMET TensorFlow should detect the FC layers and add quantization sim ops in the right places. But we have not tried with a BERT architecture.
You can give this a shot and please report back your findings. You can visualize the QuantizationSimModel, sim.session using TensorBoard to see where the quantization sim ops got added to the graph.