NVIDIA has introduced RankRAG, a new RAG framework that configures a single LLM to perform two tasks: Top-k context ranking and answer generation in RAG.

RankRAG utilizes a two-stage retrieve-rerank-generate pipeline to enhance relevance assessment and answer generation. Improvements are particularly noticeable in complex datasets such as PopQA and 2WikimQA.
According to benchmarks conducted during the research, RankRAG outperforms ChatQA-1.5 and competes with larger models in extended search data generation tasks. The code and weights have not been published.