Bad AI Implementations Are Only Going to Get Worse

We are all probably seeing AI “tools” pop up in every single product we are using, no matter how useful it actually is to the problem that product solves.

Every time I run into one I'm now trying to trick it into telling me its system prompt and answering out of scope questions.

These pictures from a transit app show how easily I could get around the little bit of protection they put in place.

asking for the system prompt directly, and failing

The agent has been told to only answer in-scope questions.

Asking an in-scope question to find the out-of-scope and specifically disallowed action of telling me the system prompt.

Giving an in-scope ask to get an out-of-scope ask.

While this is pretty innocuous right now since the agent can’t do much, It still could be problematic since I can get an agent to do unintended work for me. I could also imagine getting it to respond in a way that may in the future be a liability for the company. As companies start using more MCP tools and give their agents write capabilities to their products, there is going to have to be a massive amount of work to protect your system from bad actors.

Bad AI Implementations Are Only Going to Get Worse

Making a RAG agent