Originally posted: 2025-05-18. View source code for this page here.
What can you do when you hit the limits of your LLM's abilities, and it repeatedly fails to 'one shot' a prompt?
This post describes my experience in building a medium-complexity (c. 2000 lines of code) maths quiz game, and what techniques seemed to work.
In a nutshell:
I wanted to build an educational quiz game which presented a numberline, and asked the user correctly place numbers and fractions. This required fairly complex axis behaviour - zooming and panning, and conditional label placement.
None of the frontier LLMs (o4-high, sonnet 3.7, gemini 2.5) could get close with a one-shot implementation. They couldn't even render the numberline as instructed.
However, I succeeded in 'vibe coding' the app (i.e. no human-written code) using the following process:
Once you have your step by step implementation plan, copy and paste each step into Copilot agent mode (I find GPT 4.1 works well), and verify everything's working before proceeding to the next step. The verification is crucial - if you miss a bug at an earlier stage, the problems will often compound.
If things go wrong, use files-to-prompt to copy your whole repo into a prompt, and give it to Gemini 2.5 in Google AI studio to fix the problem.
If you're implementing a complex feature on an existing app, you can follow a similar process, but ensure you provide the full app source code as context, if possible. Again, I recommend Gemini 2.5 here with its 1m token context.
One interesting aspect of this process is that it's really just a combination of long context prompting and LLM reasoning paired with a little human input. So it seems likely that increasingly, as we see more agentic behaviour from LLMs, the computer will do all of this for you.
Thanks to Mary Rose Cook for this video which I found useful in learning and understanding some of these techniques.