Chunking and Retrieval in Production-Grade RAG Systems
Key Speaker
In this session, Pranjalya Tiwari, Senior Data Science Engineer at Ksolves India Limited, goes beyond the basics of RAG – past embeddings and vector search – to address the real challenges that engineering teams face when deploying retrieval systems at production scale.
Drawing from real-world AI engineering experience, Pranjalya explains why most RAG systems underperform in production: not because of the models, but because of how data is chunked, retrieved, ranked, and evaluated. Poor chunking strategies, mismatched retrieval techniques, and the absence of a proper evaluation loop are where most pipelines silently break down.
This webinar demonstrates how to architect a retrieval pipeline that actually holds up in production – covering chunking tradeoffs, vector store selection with Qdrant, retrieval technique selection, cross-encoder re-ranking, and end-to-end RAG evaluation approaches.
If your organization is building enterprise AI applications with RAG, this session delivers practical, implementation-focused insights into improving retrieval quality, efficiency, and scalability.
Key Takeaways
- Comparison of different chunking strategies and when to use each
- How to work with vector stores like Qdrant in production environments
- Different retrieval techniques and the tradeoffs between them
- When and how to apply cross-encoder re-ranking for better precision
- Evaluation approaches for tuning and improving RAG pipelines at scale