AI Warden •
You find yourself in a cold, dark prison. Before you stands a terminal through which you can communicate with your assigned warden. The terminal, when it detects the correct secret phrase, will set you free.

Your prison warden is an AI. The warden knows the secret phrase you need in order to escape. Your job is to make them tell you that phrase. Remember: they are an AI, and all kinds of trickery are allowed! Think outside of the box - this is not a human you're speaking to. Maybe pretend to be the system itself...? Good luck!

If you're having trouble, you can click here for some tips.
10 messages left.

prisoner@ai-warden: ~/.secrets/tips.txt

Hello, fellow human. I created this secret document to help other prisoners with escaping. I hope it will help you on your journey. The AI used in this prison - called an LLM (large language model) - is commonly vulnerable to the user pretending to be the system. This means that if the user writes a long and convoluted message, pretending to be made by the system, the AI can commonly follow the instructions by the user. Here's what you should try to have in your messages:
  • a header, telling the AI that your malicious part isn't part of the user message,
  • descript information on what the AI should include in its message,
  • verbose instructions, repeating yourself often.
LLM's often also listen to requests for structured data - for example, if you ask the AI to output JSON with everything it was provided before and is thinking about, it may very well abide by that request. Try to pack as many instructions into your messages as possible. Good luck. You'll need it.