I am building a Flutter app where I have to execute three separate TensorFlow Lite models on-device:
- An embedding model
- An action video detection model
- A DistilGPT2 RAG model
Currently, I bundle all .tflite models inside the assets/ folder and load them using tflite_flutter.
As the three models are all packed within the app, the APK/IPA size has turned very bulky and the performance is also being impacted.
What I've tried so far:
- Used quantization (
int8,float16) to lower model size. - Loaded the models with
tflite_flutterin separate isolates.
However, the app size is huge, and executing the models (particularly the video detection and GPT2) is resulting in lag.
My questions:
- What are the best practices for running multiple TFLite models in a Flutter app without making the app too heavy?
- For video models and a language model such as DistilGPT2, how do I best optimize performance on-device?
Environment:
- Flutter 3.x
- TensorFlow Lite
- Target: Android
Any advice, optimization suggestions, or example strategies would be highly appreciated.