A Streamlit-based Multimodal AI Generator using Google's Gemini API for text and image generation.
-
Updated
Jun 28, 2025 - Python
A Streamlit-based Multimodal AI Generator using Google's Gemini API for text and image generation.
A real-time image captioning and visual question answering (VQA) system. This project uses computer vision and NLP to generate descriptive captions for images and answer user questions about them.
Add a description, image, and links to the multimodel-ai topic page so that developers can more easily learn about it.
To associate your repository with the multimodel-ai topic, visit your repo's landing page and select "manage topics."