ARCHIVES

Original Article

Agentic AI Based Smart Assistant: A Multimodal Visual Question Answering System Using Fast API and GROQ Vision-Language Models

Gaurav Arya1Shuchi Sharma2

¹ Student, Department of AIML, ADGIPS, FC-²⁶ Shastri Park, Shahdara, New Delhi, India. ² Assistant Professor, Department of AIML, ADGIPS, FC-²⁶ Shastri Park, Shahdara, New Delhi, India.

Published Online: May-August 2026

Pages: 237-240

Abstract

This paper presents the design and implementation of an Agentic AI Based Smart Assistant — a multimodal web application capable of analyzing images and answering natural language queries in real time. The system integrates FastAPI as the backend web framework with the GROQ API, leveraging LLaMA-based Vision-Language Models (VLMs) to interpret both visual and textual data simultaneously. An additional Retrieval-Augmented Generation (RAG) pipeline using TF-IDF vectorization enables document-aware question answering from uploaded PDFs. The system was tested across ten functional scenarios including valid and invalid inputs, large images, concurrent requests, and API fault conditions — all passing successfully. Results demonstrate strong contextual accuracy and low-latency performance suitable for real-world applications in medical imaging, smart education, and automated inspection.

Related Articles

2026

Artificial Intelligence in Learning and Teaching

2026

Admin Assist: An AI – Driven Configuration and Orchestration for Enterprise Application

2026

Enhancing Blood Group Identification using pigeon inspired optimization: An Innovative Approach

2026

Eco-Genius: Power Up Smart, Power Down Waste

2026

Crowd-Sourced Disaster Response and Rescue Assistant

2026

Unveiling Deepfake Detection Using Vision Transformers: A Survey and Experimental Study

2026

A Novel Stateful Orchestration Pattern for Data Affinity and Transactional Integrity in Sharded Backend Architectures

2026

Legal Challenges of Agentic AI Systems in Education and Employment Decision-Making

2026

New-Hybrid Soft Computing Model for Stock Market Predictions

2026

Human Emotion Distribution Learning from Face Images Using CNN

Agentic AI Based Smart Assistant: A Multimodal Visual Question Answering System Using Fast API and GROQ Vision-Language Models | INDJCST | INDJCST