Payload Logo

Making a RAG agent

Author

Aidan

Date Published

Curving abstract shapes with an orange and blue gradient

Being so deep in PM mode, I haven’t really had a reason to sit down and write any code in a while. I’ve kept up with ML developments but when I started talking with my engineering team about tools I realized there were a ton of use cases for agents in my day-to-day life that inspired me to dive in.

My requirements:

System can run locally on my machine exclusively

Ability to expand tools agent can use

Ability to expand amount of agents in the future

Approach:

I found that Langchain has a great intro into creating a RAG agent using their tooling. It was very easy to follow along with and make changes to for my purposes. Using their guide I was able to spin up the code in just a few hours. I’m using ollama 3.2 locally as my chat and embeddings model and just using an in-memory vector store. Langchain provided a bunch of great document parsers out of the box that where easy to use. I focused mainly of reading PDFs that I knew the context of to see how well the system performed. I also experimented with smaller and larger models to see how the results changed.

Results:

It’s pretty impressive how well even small models are able to respond at a high level of understanding of the materials I’ve been giving them. Being able to also have full control over the model, and being able to run it fully locally makes the idea of letting it access sensitive data more attractive.

I also compared how ChatGPT 4.0 responded to the same questions, and this project made me keenly aware of how many high quality tools and agents ChatGPT has built. Their responses where always better.

I think as these tools get better the threat of having agents that you are letting deep into your personal life without truly “owning” or having “visibility” into what they are doing becomes more and more of a problem. With ChatGPT for example; I don’t think the quality of the response makes up for the worry of providing them sensitive documentation. ollama 3.2 provided responses that where good enough that I would prefer using it locally for security.

This makes me interested in how you could build a set of agents that just serve you without having to worry about who is using their knowledge elsewhere for training.