Architecture
System architecture β Mastra framework, 4-layer memory, agent orchestration, and infrastructure.
Architecture
HeyCMO is built on the Mastra AI framework with a layered architecture designed for reliability, memory persistence, and multi-agent orchestration.
Tech Stack
| Layer | Technology |
|---|---|
| HTTP Server | Hono (lightweight, edge-compatible) |
| AI Framework | Mastra (agents, tools, workflows, RAG) |
| LLM Providers | OpenAI GPT-4o, GPT-4o-mini (via Mastra model routing) |
| Database | PostgreSQL (via Prisma ORM + @prisma/adapter-pg) |
| Memory | @mastra/memory + @mastra/rag with pgvector |
| Embeddings | OpenAI text-embedding-3-small |
| Task Queue | Inngest (durable workflow execution, cron scheduling) |
| Billing | Stripe (checkout sessions, webhook handling) |
| Integrations | Composio (100+ platform connectors via MCP) |
| Observability | Langfuse (@mastra/langfuse) + Pino structured logging |
| Visual Rendering | Playwright (carousel, static, OG image generation) |
| Voice | ElevenLabs HTTP API (text-to-speech for audio content) |
System Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Hono HTTP Server β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββ β
β β /mcp/sse β β/api/healthβ β/api/inngeβ β/api/* β β
β β/mcp/msg β β/api/h/rdy β β st β βbilling β β
β ββββββ¬ββββββ ββββββββββββ ββββββ¬ββββββ βββββ¬βββββ β
β β β β β
β ββββββΌβββββββββββββββββββββββββββββΌββββββββββββββΌββββ β
β β Mastra AI Framework β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Agent Layer (17 agents) β β β
β β β CMO β SEO Writer, Social, Email, Analyst, β β β
β β β Engagement, Researcher, CRO, Growth, Sales β β β
β β ββββββββββββββββββββ¬ββββββββββββββββββββββββββββ β β
β β β β β
β β βββββββββββ ββββββΌβββββ ββββββββββββββββββββββ β β
β β β Tools β βWorkflowsβ β Memory (4-layer) β β β
β β β (15+) β β (12) β β WorkingβSemantic β β β
β β βββββββββββ βββββββββββ β EpisodicβProceduralβ β β
β β ββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β βPostgreSQLβ β Inngest β β Stripe β β Composio β β
β β + pgvec β βCron/Queueβ β Billing β β MCP β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ4-Layer Memory System
HeyCMO agents have persistent, contextual memory across all interactions:
Working Memory
Short-term context for the current conversation. Stores brand profile, recent instructions, and active task context. Every agent reads brand context from working memory before making decisions.
Semantic Memory (RAG)
Long-term knowledge stored as vector embeddings in PostgreSQL with pgvector. Past content, research findings, and brand guidelines are chunked, embedded with OpenAI's text-embedding-3-small, and retrievable via similarity search.
- Content Query Tool β Agents search past content for reference and inspiration
- Performance Query Tool β Agents query historical performance data for data-driven decisions
Episodic Memory
Records of past agent interactions and outcomes. Enables agents to learn from previous successes and failures β "last time we wrote about this topic, it scored 0.85 on brand voice."
Procedural Memory
Self-optimization loop that adjusts agent behavior based on patterns. The Self-Optimization Workflow analyzes telemetry data, detects patterns (declining quality, improving engagement), and proposes weight adjustments to scoring criteria.
Human-in-the-Loop Approval
HeyCMO uses a suspend/resume workflow pattern for human approval. When the content creation workflow produces content:
- Content is scored by the eval system (brand voice, quality, engagement prediction)
- The workflow suspends and waits for human approval
- You review the content via
list_pending_approvalsin your MCP client - You call
resume_workflowwith an approve or reject decision - Approved content proceeds to publishing; rejected content goes back for revision
This ensures nothing publishes without your explicit approval while keeping the automation pipeline fully intact.
Inngest Cron Jobs
HeyCMO uses Inngest for durable, scheduled workflow execution:
- Daily Research β Automated content research runs on a schedule
- Engagement Monitoring β Periodic checks for comments and DMs across platforms
- Self-Optimization β Regular performance analysis and strategy adjustment
- Events Cleanup β Periodic cleanup of stale event data
Inngest provides automatic retries, crash recovery, and observability for all scheduled functions. The /api/inngest endpoint serves as the function handler.
Real-Time Progress Events
Long-running workflows (like the research pipeline) emit real-time progress notifications via the MCP protocol:
Step 1/6: Analyzing brand context...
Step 2/6: Searching web sources via Exa...
Step 3/6: Extracting social signals from X/Twitter...
Step 4/6: Scoring and ranking ideas...
Step 5/6: Identifying SEO quick wins...
Step 6/6: Storing results in semantic memory...Progress events are forwarded to your MCP client as notifications/progress messages, giving you live visibility into what HeyCMO is doing.
Workflow Recovery
On server restart, HeyCMO automatically recovers interrupted workflows from PostgresStore. The recovery system checks for running or suspended workflow runs and restarts them, ensuring research pipelines and other long-running workflows survive server restarts without data loss.
Recoverable workflows: researchPipeline, contentCreation, crossChannelPublish, brandInterview, engagementResponse, analyticsReport, selfOptimization.
Agent Safety Processors
Every agent has input and output processors for safety:
| Processor | Purpose | Agents |
|---|---|---|
| UnicodeNormalizer | Prevents Unicode-based prompt injection | All agents |
| PromptInjectionDetector | Detects and warns on injection attempts | CMO |
| TokenLimiterProcessor | Caps input tokens to prevent abuse | CMO, Email, Researcher |
| ModerationProcessor | Blocks harmful content in outputs | Social Manager |
| PIIDetector | Redacts personal information from outputs | Engagement |
| LanguageDetector | Detects input language for proper routing | Engagement |
| ToolCallFilter | Validates tool call parameters | Social Manager |
Eval System
Before any content is presented for approval, it passes through three evaluation dimensions:
- Brand Voice Match β Compares generated content against the brand profile for tone, vocabulary, and style consistency (0β1 score)
- Content Quality β Evaluates structure, depth, readability, and SEO optimization (0β1 score)
- Engagement Prediction β Predicts likely engagement based on historical performance patterns (0β1 score)
Content scoring below configurable thresholds is flagged for revision before human review.
Infrastructure
Rate Limiting
Per-key and per-IP rate limiting with configurable burst rates per endpoint category (API, MCP, checkout, webhooks, admin).
Authentication
API keys use the hcmo_live_ prefix, are hashed with SHA-256 for storage, and validated with timing-safe comparison. MCP endpoints use token-based auth via query parameter.
CORS & Security Headers
Strict CORS policy (only heycmo.ai and app.heycmo.ai), plus X-Content-Type-Options, X-Frame-Options, Referrer-Policy, Permissions-Policy, and Strict-Transport-Security headers on all responses.
Body Size Limits
Request body size is enforced per endpoint: 100KB for API routes, 1MB for Stripe webhooks, 10KB for admin routes.
SSRF Protection
All URL-accepting tools validate against private/internal IP ranges (localhost, 10.x, 172.16-31.x, 192.168.x, link-local, AWS metadata endpoint) to prevent server-side request forgery.