<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>RAG on PG Blog</title><link>https://pg-blogs.netlify.app/tags/rag/</link><description>Recent content in RAG on PG Blog</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sat, 04 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://pg-blogs.netlify.app/tags/rag/index.xml" rel="self" type="application/rss+xml"/><item><title>Making RAG Accurate in Java</title><link>https://pg-blogs.netlify.app/posts/24-making-rag-accurate-in-java/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://pg-blogs.netlify.app/posts/24-making-rag-accurate-in-java/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://pg-blogs.netlify.app/posts/20-rag-from-scratch-in-java/"&gt;RAG From Scratch in Java&lt;/a&gt; built a retrieval pipeline out of cosine similarity and a reranking pass, and &lt;a href="https://pg-blogs.netlify.app/posts/22-vector-databases-in-practice-for-java/"&gt;Vector Databases in Practice for Java&lt;/a&gt; moved that same index into pgvector so it can hold millions of chunks. Neither post asked the question that actually decides whether a RAG system is any good: &lt;strong&gt;does it retrieve the right chunks, and how would you know?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This post answers that question in two halves. First, three techniques that improve what gets retrieved in the first place — hybrid search that catches what pure vector similarity misses, metadata filtering that narrows the search space before ranking even starts, and chunking choices that shape recall long before a query is ever run. Second, the metrics that turn &amp;ldquo;this feels better&amp;rdquo; into a number you can track across a change: recall@k, precision@k, MRR, and nDCG. Everything below is illustrative, non-executed prose code, consistent with the pipeline built in post 20.&lt;/p&gt;</description></item><item><title>Making RAG Accurate in Python</title><link>https://pg-blogs.netlify.app/posts/25-making-rag-accurate-in-python/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://pg-blogs.netlify.app/posts/25-making-rag-accurate-in-python/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://pg-blogs.netlify.app/posts/21-rag-from-scratch-in-python/"&gt;RAG From Scratch in Python&lt;/a&gt; built a retrieval pipeline out of cosine similarity and a reranking pass, and &lt;a href="https://pg-blogs.netlify.app/posts/23-vector-databases-in-practice-for-python/"&gt;Vector Databases in Practice for Python&lt;/a&gt; moved that same index into pgvector so it can hold millions of chunks. Neither post asked the question that actually decides whether a RAG system is any good: &lt;strong&gt;does it retrieve the right chunks, and how would you know?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This post answers that question in two halves. First, three techniques that improve what gets retrieved in the first place — hybrid search that catches what pure vector similarity misses, metadata filtering that narrows the search space before ranking even starts, and chunking choices that shape recall long before a query is ever run. Second, the metrics that turn &amp;ldquo;this feels better&amp;rdquo; into a number you can track across a change: recall@k, precision@k, MRR, and nDCG. Everything below is illustrative, non-executed prose code, consistent with the pipeline built in post 21.&lt;/p&gt;</description></item><item><title>RAG From Scratch in Java</title><link>https://pg-blogs.netlify.app/posts/20-rag-from-scratch-in-java/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://pg-blogs.netlify.app/posts/20-rag-from-scratch-in-java/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Retrieval-augmented generation (RAG) is the pattern behind almost every &amp;ldquo;chat with your documents&amp;rdquo; feature: instead of hoping a model already knows the answer, you find the passages most likely to contain it and hand them to the model as context. Done well, it&amp;rsquo;s the single highest-leverage technique for keeping an LLM application grounded in facts it didn&amp;rsquo;t memorize.&lt;/p&gt;
&lt;p&gt;This post builds RAG &lt;strong&gt;from scratch&lt;/strong&gt; — no vector database, no framework — so the mechanics are visible end to end: splitting documents into chunks, turning them into embeddings, searching them with nothing more than array math, reranking the results, and generating a grounded answer. It builds directly on the grounding discipline from &lt;a href="https://pg-blogs.netlify.app/posts/11-building-reliable-llm-apps-in-java/"&gt;Building Reliable LLM Applications in Java&lt;/a&gt; — &amp;ldquo;give the model the source material and instruct it to answer only from that material&amp;rdquo; — by showing where that source material actually comes from. We&amp;rsquo;ll close with the question every RAG design eventually has to answer honestly: when does retrieval beat simply pasting more into the context window?&lt;/p&gt;</description></item><item><title>RAG From Scratch in Python</title><link>https://pg-blogs.netlify.app/posts/21-rag-from-scratch-in-python/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://pg-blogs.netlify.app/posts/21-rag-from-scratch-in-python/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Retrieval-augmented generation (RAG) is the pattern behind almost every &amp;ldquo;chat with your documents&amp;rdquo; feature: instead of hoping a model already knows the answer, you find the passages most likely to contain it and hand them to the model as context. Done well, it&amp;rsquo;s the single highest-leverage technique for keeping an LLM application grounded in facts it didn&amp;rsquo;t memorize.&lt;/p&gt;
&lt;p&gt;This post builds RAG &lt;strong&gt;from scratch&lt;/strong&gt; — no vector database, no framework — so the mechanics are visible end to end: splitting documents into chunks, turning them into embeddings, searching them with nothing more than array math, reranking the results, and generating a grounded answer. It builds directly on the grounding discipline from &lt;a href="https://pg-blogs.netlify.app/posts/10-building-reliable-llm-apps-in-python/"&gt;Building Reliable LLM Applications in Python&lt;/a&gt; — &amp;ldquo;give the model the source material and instruct it to answer only from that material&amp;rdquo; — by showing where that source material actually comes from. We&amp;rsquo;ll close with the question every RAG design eventually has to answer honestly: when does retrieval beat simply pasting more into the context window?&lt;/p&gt;</description></item><item><title>Vector Databases in Practice for Java</title><link>https://pg-blogs.netlify.app/posts/22-vector-databases-in-practice-for-java/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://pg-blogs.netlify.app/posts/22-vector-databases-in-practice-for-java/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://pg-blogs.netlify.app/posts/20-rag-from-scratch-in-java/"&gt;RAG From Scratch in Java&lt;/a&gt; built retrieval with nothing but an array of doubles and a &lt;code&gt;Comparator&lt;/code&gt;: cosine similarity computed in a loop, top-k picked with a stream sort. That post said outright that this is a brute-force &lt;code&gt;O(n)&lt;/code&gt; scan — fine for a few thousand chunks, the wrong tool once a corpus reaches millions. This post picks up exactly there: how do you store and search vectors at that scale, using Postgres, and when do you need something else entirely?&lt;/p&gt;</description></item><item><title>Vector Databases in Practice for Python</title><link>https://pg-blogs.netlify.app/posts/23-vector-databases-in-practice-for-python/</link><pubDate>Sat, 04 Jul 2026 00:00:00 +0000</pubDate><guid>https://pg-blogs.netlify.app/posts/23-vector-databases-in-practice-for-python/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://pg-blogs.netlify.app/posts/21-rag-from-scratch-in-python/"&gt;RAG From Scratch in Python&lt;/a&gt; built retrieval with nothing but a list of floats and &lt;code&gt;sorted()&lt;/code&gt;: cosine similarity computed in a loop, top-k picked with a slice. That post said outright that this is a brute-force &lt;code&gt;O(n)&lt;/code&gt; scan — fine for a few thousand chunks, the wrong tool once a corpus reaches millions. This post picks up exactly there: how do you store and search vectors at that scale, using Postgres, and when do you need something else entirely?&lt;/p&gt;</description></item></channel></rss>