Applied AI Software Engineering: RAG
annotations
“There’s an engineering project I’m seeing almost every startup building a Large Language Model (LLM) put in place: building their own Retrieval Augmentation Generation (RAG) pipelines. RAGs are a common pattern for anyone building an LLM application. This is because it provides a layer of ‘clean prompts’ and fine-tuning. There are some existing open-source solutions, but almost everyone just builds their own, anyway.”
The most obvious solution is to input the additional information via a prompt; for example, by prompting “Using the following information: [input a bunch of data] please answer the question of [ask your question].” This is a pretty good approach. The biggest problem is that this may not scale
update these weight matrices based on additional information we’d like our model to know. This can be a good option, but it is a much higher upfront cost in terms of time, money, and computing resources. Also, it can only be done with access to the model’s weightings, which is not the case when you use models like ChatGPT, Anthropic, and other “closed source” models.
the steps to building a RAG pipeline: Step 1: Take an inbound query and deconstruct it into relevant conceptsStep 2: Collect similar concepts from your data storeStep 3: Recombine these concepts with your original query to build a more relevant, authoritative answer.