Publications

In-Situ Eval: A Modular Framework for Custom and Real-Time RAG Benchmarking

Ritvik Garimella
Kaushik Roy, University of South CarolinaFollow
Chathurangi Shyalika
Amit Sheth, University of South Carolina - ColumbiaFollow

ORCID iD

Garimella - https://orcid.org/0009-0004-7472-4690

Roy - https://orcid.org/0000-0001-6610-7845

Shyalika - https://orcid.org/0000-0002-5320-5566

Sheth - https://orcid.org/0000-0002-0021-5293

Document Type

Paper

Abstract

Retrieval-Augmented Generation (RAG) has become the standard approach for integrating domain knowledge into Large Language Models (LLMs). However, fair comparison of RAG pipelines remains difficult: data preparation is often ad hoc, subsampling methods are opaque, parameters vary across implementations, and evaluation is fragmented. We present In-Situ Eval, a unified and reproducible framework that operationalizes the full RAG pipeline with configurable subsampling strategies and both RAG-specific and generic evaluation metrics. The platform supports two execution modes: an offline Dataset mode for evaluating precomputed outputs, and a live Retrieval mode for benchmarking RAG variants with state-of-the-art LLMs. Users can flexibly select datasets, retrieval techniques, models, and metrics, enabling side-by-side comparisons, ablations, and targeted analyses. This holistic approach reduces computational costs, clarifies the impact of subsampling techniques, and provides actionable insights for real-world deployments. By facilitating transparent, customizable, and interactive benchmarking, In-Situ Eval empowers both researchers and practitioners to make informed decisions in adapting RAG pipelines to domain-specific needs.

Publication Info

AAAI - 2026 Demonstrations Track, Spring 2026.

APA Citation

Garimella, R., Roy, K., Shyalika, C., & Sheth, A. (2026). In-Situ Eval: A modular framework for custom and real-time RAG benchmarking (Demonstration). In AAAI-26 Demonstrations Program (scheduled Jan 24, 2026). Association for the Advancement of Artificial Intelligence.

Download

Included in

Computer Engineering Commons, Electrical and Computer Engineering Commons

COinS

Publications

In-Situ Eval: A Modular Framework for Custom and Real-Time RAG Benchmarking

ORCID iD

Document Type

Abstract

Publication Info

APA Citation

Included in

Search

Browse

Submissions

Links

Publications

In-Situ Eval: A Modular Framework for Custom and Real-Time RAG Benchmarking

Author(s)

ORCID iD

Document Type

Abstract

Publication Info

APA Citation

Included in

Share

Search

Browse

Submissions

Links