0

I am building a Flutter app where I have to execute three separate TensorFlow Lite models on-device:

  1. An embedding model
  2. An action video detection model
  3. A DistilGPT2 RAG model

Currently, I bundle all .tflite models inside the assets/ folder and load them using tflite_flutter.

As the three models are all packed within the app, the APK/IPA size has turned very bulky and the performance is also being impacted.

What I've tried so far:

  • Used quantization (int8, float16) to lower model size.
  • Loaded the models with tflite_flutter in separate isolates.

However, the app size is huge, and executing the models (particularly the video detection and GPT2) is resulting in lag.

My questions:

  1. What are the best practices for running multiple TFLite models in a Flutter app without making the app too heavy?
  2. For video models and a language model such as DistilGPT2, how do I best optimize performance on-device?

Environment:

  • Flutter 3.x
  • TensorFlow Lite
  • Target: Android

Any advice, optimization suggestions, or example strategies would be highly appreciated.

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.