Building for the recent Notion Custom Agent Buildathon was a fun but challenging experience. It forced me to rewire my brain around some fundamentals. Working inside the Notion environment is a distinct shift for someone used to having full control over the execution environment (like when building agents from scratch in an IDE).

I spent three days building Meeting to Action, an agent that transforms meeting transcripts into structured tasks and Slack notifications. The agent is live, and it works. On a real test run, it correctly created seven tasks across two assignees with accurate dates, processing the entire meeting in under 90 seconds.

This newsletter will focused on what I learned about the platform's constraints, the architecture decisions that made it work, and the ones that were breaking it.

The Problem: Two Days of Drift

Every team has a version of this: a meeting ends, people scatter, and the action items live only in a Notion document that nobody will open again until the next standup.

The real cost isn't just the 20 minutes someone eventually spends manually creating tasks. It is the two days of drift before they do it. "Meeting to Action" closes that loop deterministically. For a team running three meetings per week, automating this step eliminates roughly 45 minutes of manual task creation, 2–3 clarifying Slack threads, and one status-check meeting per sprint.

The Engineering Constraints: Credits, Models, and Black Boxes

The first thing I had to understand was how Notion handles the agent's "brain." Notion routes requests to frontier models from both OpenAI and Anthropic, so I didn't have to worry about selecting the absolute best base model for the task.

However, cost efficiency is governed by a strict Action/Task/Credit system. During the build period, quotas were unlimited (my testing used around 260 credits), but Notion is moving to a business plan with a 1,000-credit quota. My target was to have each run use as few actions as possible—ideally under 20.

Wiring everything together required some trial and error. The logging in Notion is essentially a black box, which makes sense for non-technical users but is frustrating for developers. However, Notion includes a brilliant "self-healing" tool. Whenever the agent faces an error (visible in a tracing area), it instantly prompts a "Fix this agent" button, attempting to resolve the bottleneck itself.

Architecture: The Three Gates

To protect the credit quota, the agent relies on a strict sequence of gates before any AI extraction happens. Each gate is significantly cheaper than a full run.

  1. Gate 0A (Status Check): The trigger fires when a page's status changes. The agent immediately checks if the status is exactly Needs Processing. If not, it halts.

  2. Gate 0B (Idempotency): In distributed systems, operations that write to a database must be idempotent. If someone accidentally re-triggers the agent, we don't want duplicate tasks. The agent reads an Agent Ran checkbox on the page and queries the Tasks database for existing rows linked to this meeting. If it finds either, it halts at essentially zero cost.

  3. Gate 1 (Content Check): It counts the words in the page body. If the transcript is under 100 words, it sets the status to Too Short and halts.

Only if all three gates pass does the agent proceed to read the transcript.

Data Scoping as a Cost Lever

The agent has access to exactly three databases: Meeting Notes, Tasks, and a filtered view of the Team Directory.

A real Team Directory might have 20+ columns (salary band, manager, home address). The agent only needs three: Full Name, Notion User, and Slack Handle.

I created a locked, filtered view—the Agent View—and granted the agent access only to that. Hiding irrelevant columns cuts token usage per lookup by ~85% in regular agentic systems. It makes the whole process cheaper but also more accurate. It maps to the least-privilege principle: give the model access to exactly what it needs, and no more. Irrelevant data creates surface area for the model to hallucinate associations that do not serve the task.

Furthermore, I implemented an "Attendees-first" name resolution. The agent checks the meeting's Attendees property (already in memory) before querying the Team Directory. If everyone in the meeting is tagged, zero additional database queries are needed to assign the tasks.

The "Typed SOP" Prompting Strategy

Because everything regarding the agent lives in a single Notion page, organizing the system prompt correctly was quite a task. The most consequential choice in the entire build was replacing open-ended instructions with a Typed SOP (Standard Operating Procedure).

An early version of the prompt was open-ended: "Read the meeting notes and find all the action items." This pushes the model into unguided generation. It costs reasoning cycles, introduces variance, and produces outputs that are hard to debug.

Instead, the prompt acts as a rigid decision tree. Rather than "find action items," the prompt says: Scan for these specific verb phrases (will, needs to, should, agreed to...). For each match, extract exactly three fields. This converts reasoning into pattern matching. Pattern matching is deterministic, cheap, and auditable. We did the same thing for date resolution. Instead of letting the model guess what "end of next week" means, we hardcoded the rules:

  • "next week" → Friday of the following week

  • "end of next week" → Friday of the following week

In agent prompts, apparent redundancy is real specificity. Closing failure modes explicitly is better than relying on the model's generalization (even though I think the current models are pretty intelligent to understand these rules by themselves, one thing that we should always pay attention to is: If we cannot control the platform, there is a chance we cannot control the models. They can have pretty frontier models today, but change for cheaper, less capable models tomorrow. So this is what we are building for. PLUS it creates good practices in Harness Engineering).

Two Modes of Execution

The resulting pipeline supports two workflows:

  • Mode A (Zero-Touch): A Google Meet call ends, Notion Calendar’s AI automatically transcribes it via /note, the page lands in the database, a Notion Automation sets the status to Needs Processing, and the agent fires. Zero human intervention.

  • Mode B (Manual Quality Gate): A user takes manual notes and clicks a pre-built "Process Meeting" button when finished. This sets the status and fires the agent, acting as a human quality gate.

Both modes end the same way: Tasks are populated, the page is marked Done, and a formatted summary is pushed to Slack. A typical run costs roughly 15 actions, compared to the 25–40+ actions an unoptimized agent would burn.

Where to find more about this agent?

I’ve described everything here: https://manticorevault.github.io/meeting-to-action/

The page works like a hub for all the info related to the Meeting-To-Action agent, so it’s also a general explanation about, and not a more technical one.

What's Next

Looking forward, I have a hypothesis I want to test: Can I create sub-pages inside the agent page to separate instructions? This could allow me to give the agent different sets of tools related to different triggers, streamlining the organization even further.

The agents that work consistently in production are the ones where the builder has done the work the model shouldn't have to do: defining explicit rules, limiting context, and building idempotency guards. Good architecture isn't complex. It's just specific.

Keep reading