Prompt engineering is over. Context engineering is the new craft.
Models are smart enough now that wording the prompt is the small part. What you load into the context window is the part that actually decides the outcome.
For about three years, prompt engineering was a real skill. You learned which phrases worked, which ones made the model sandbag, how to use "think step by step", how to wrap instructions in XML tags, how to put the important stuff at the end.
That work mattered because the models were sensitive to phrasing. A prompt that said "respond concisely" would produce a paragraph. A prompt that said "respond in exactly one sentence" would produce a sentence. The difference between the two was the difference between a product that worked and a product that did not.
That world is mostly gone.
The frontier models released in 2025 and 2026 are forgiving about wording. They infer intent. They ignore filler. They handle vague instructions about as well as precise ones, as long as the goal is clear. You can still get a tiny quality bump from careful phrasing, but you cannot build a product around that bump anymore.
What matters now is what you load into the context window before the model runs. We call this context engineering, and it is the actual job.
The context window as a workspace
Think of the context window as a desk. The model is an assistant that shows up, looks at the desk, and does whatever the task says. The quality of the output is determined almost entirely by what you put on the desk before the assistant arrives.
Bad context engineering looks like this:
- A prompt that says "write a landing page for this company"
- The name of the company
Good context engineering looks like this:
- A prompt that says "write a landing page for this company"
- The name of the company
- Its three real services, pulled from a crawl of the existing site
- Two testimonials written by actual customers
- The brand colors and the logo URL
- A reference page with the exact visual style the client approved
- A memory file with everything we learned from the last five homepage
builds
- The audit log from the last time we tried this and failed
Same prompt. Same model. Vastly different outcomes. The phrasing of the prompt is the smallest part of the picture.
The four sources of context
In practice, every context window you load is assembled from four sources:
Files. Source code, reference documents, crawl output, brand assets. Static material that lives on disk and gets read in at run time.
Memory. Notes the agent wrote to itself on past runs. Lessons learned. Things that broke last time. Coding conventions. Client preferences. The good ones compound: a note written in week one is still paying rent in week fifty.
Tools. The tool definitions themselves are context. A tool called create_invoice with a two-line description tells the model a lot about what it is allowed to do. A tool with ten parameters and a thirty line description tells the model even more. Shipping the right tools with the right descriptions is a form of context engineering.
History. What the agent just did in the current run. Which steps succeeded, which failed, what the last tool call returned. The immediate past is context for the immediate future.
An agent is only as good as the four of those combined. You can give a frontier model a perfect prompt and zero context and get slop. You can give a mediocre model excellent context and get work you would ship.
What it looks like at Company Agents
This is not a theoretical position for us. It is how the product is built.
Every agent has memory at four scopes. Agent-level notes. Project-level notes. Client-level notes. Company-level notes. Before every run, the agent reads the relevant notes. After every run, the agent writes new notes. Lessons that prove themselves over multiple runs promote to broader scopes, so a hard-won insight at the agent level eventually benefits every future run in the same client, and then the company.
Every workflow has a checkpoint format. When an agent runs a multi-step workflow, the orchestrator writes a structured checkpoint after every step, not a conversation dump. The next run gets a clean slate with the structured facts, not 200k tokens of replayed conversation that the model has to re-parse.
Every tool carries its trust tier in its description. A programmatic check runs first. A human review is required for risky operations. The agent sees the full policy, in context, at run time.
None of that is new. None of that is surprising. It is just the real work of running agents that do what you want them to do. The prompt is a paragraph at the top. The context is everything below it.
If you are still writing prompts
The best use of your time in 2026 is not polishing the wording. It is asking:
- What does the model need on its desk to do this well?
- Where does that content live?
- How do I load the right subset for this specific run?
- What do I want the agent to write down for next time?
If you can answer those four questions for your use case, your agent will get noticeably better. Probably more than any prompt rewrite ever moved it.
The craft moved. The work did not.