🧠 HeyCMO
Features

SEO + GEO Scanner

One-click dual-score audit for any URL — SEO + Generative Engine Optimization, with per-engine breakdowns, a prioritized checklist, and "Fix with AI" routing.

SEO + GEO Scanner

The Scanner is a one-click audit at /scan. Customer enters a URL, waits 5–15 seconds, and gets back two scores (SEO + GEO), a per-engine breakdown (ChatGPT / Perplexity / Gemini / Google AI), a prioritized checklist (Urgent / Recommended / Done), and copy-pasteable artifacts (llms.txt, JSON-LD).

It is HeyCMO's response to Base44's SEO+GEO scanner, with three deliberate improvements:

  1. Dual scores stay separate — Base44 markets dual scores but ships a unified number. We don't collapse them.
  2. Works on any URL — Base44 is locked to Base44-built apps.
  3. Per-engine GEO breakdown surfaced — fills the gap @HubLensOfficial flagged in the Base44 launch thread.

Configuration

PropertyValue
Phase6
SchemaScanResult, ScanFix
Orchestratorapps/api/agent/lib/seo-geo-scan.ts
SEO scorerapps/api/agent/lib/seo-score.ts
Checklist composerapps/api/agent/lib/checklist-composer.ts
UI/scan
Run modeinline (5–15s typical, 30s HTTP timeout)

Dual-score model

{
  seoScore: number,        // 0-100, how Google ranks you
  geoScore: number,        // 0-100, how AI engines cite you
  citabilityGrade: 'A' | 'B' | 'C' | 'D' | 'F',
  perEngineScores: {
    chatgpt: number,       // 0-100
    perplexity: number,
    gemini: number,
    googleAi: number,
  }
}

Score semantics: 80+ green, 60–79 amber, <60 red. Grade is mapped from the GEO score.

SEO scoring (8 dimensions)

computeSeoScore(html, url, ctx) is a pure function. No LLM calls, no I/O — designed to run inline in the orchestrator.

DimensionWeightWhat's checked
Meta tags25%<title> length 10–60 chars, <meta name="description"> present and ≤160 chars, charset, <html lang>
Open Graph15%og:title, og:description, og:image, og:type
Structured data15%At least one JSON-LD <script type="application/ld+json">
Canonical10%Present, https, no parameterized loop
Crawlability15%/robots.txt reachable, /sitemap.xml or /sitemap_index.xml reachable
Content quality10%H1 exactly once, H2+ structure, word count
Image optimization5%Alt text on ≥80% of <img> tags
Mobile-friendly5%<meta name="viewport" content="width=device-width">

Total = 100. Score is rounded to nearest int.

Each failed check emits a SeoIssue with: stable kebab-case id, area, severity (critical | warning | info | pass), title, detail, fix, optional fixAgentId, and fixType.

GEO scoring (9 dimensions)

GEO scoring is the AI-citability audit (auditAiCitability). The 9 dimensions:

DimensionWhat it measures
blufPresenceBottom-Line-Up-Front answers — direct, citable statements at the top of sections
citationDensityInline links to external authoritative sources
comparisonStructureTables, vs-comparisons, side-by-side breakdowns
statisticsDensityConcrete numbers, percentages, dated data points
schemaMarkupJSON-LD entity coverage (Organization, Product, Article, FAQ)
contentLengthToken count in the meaningful body
headingDepthH1/H2/H3 hierarchy depth and balance
referenceToneEncyclopedic, factual phrasing vs marketing copy
freshnessdatePublished / dateModified recency signals

Per-engine breakdown

Each engine weights the 9 GEO dimensions differently. Weights from public research (Princeton GEO 2024, Foglift category weighting, Base44 launch-thread framing).

chatgpt: {                 // Bottom-Line-Up-Front + citations dominate
  blufPresence: 0.25,
  citationDensity: 0.20,
  comparisonStructure: 0.15,
  statisticsDensity: 0.15,
  schemaMarkup: 0.10,
  contentLength: 0.10,
  headingDepth: 0.05,
}

perplexity: {              // Citation-first, freshness matters
  citationDensity: 0.30,
  freshness: 0.20,
  blufPresence: 0.15,
  statisticsDensity: 0.15,
  referenceTone: 0.10,
  schemaMarkup: 0.10,
}

gemini: {                  // Entity-graph leaning
  schemaMarkup: 0.30,
  headingDepth: 0.20,
  contentLength: 0.15,
  comparisonStructure: 0.15,
  blufPresence: 0.10,
  statisticsDensity: 0.10,
}

googleAi: {                // AI Overviews still bias to traditional SEO
  schemaMarkup: 0.25,
  contentLength: 0.20,
  citationDensity: 0.15,
  headingDepth: 0.15,
  blufPresence: 0.10,
  statisticsDensity: 0.10,
  freshness: 0.05,
}

Each per-engine score is a weighted sum of the 9-dimension scores, normalized to 0–100.

Orchestrator flow

runSeoGeoScan(rawUrl) in apps/api/agent/lib/seo-geo-scan.ts:

  1. validateOutboundUrl(url) — SSRF guard.
  2. Fetch HTML once (15s timeout, follow redirects).
  3. Probe /robots.txt + /sitemap.xml in parallel — context for the SEO scorer's crawlability dimension.
  4. Parallel:
    • auditAiCitability(url) — GEO score (re-fetches internally; ~7 KB extra is negligible).
    • computeSeoScore(html, url, ctx) — SEO score using the already-fetched HTML.
    • croAuditTool(url) — CRO heuristic audit, optional. Wrapped with a 20s hard timeout (SCAN_CRO_AUDIT_TIMEOUT_MS env override). On any failure (timeout, fetch error, malformed payload), croScore returns null and the SEO+GEO scan still completes — the CRO panel is silently hidden in the UI.
  5. Compute per-engine scores from the 9 GEO dimensions.
  6. composeChecklist({ seo, geo, cro }) — bucket Urgent / Recommended / Done. CRO issues are merged in only when the optional audit succeeded.
  7. generateLlmsTxt(brandInfo) — auto-extracted brand name, description, contact email.
  8. Generate JSON-LD — Organization (homepage only) + WebPage (always) + Article (when og:type=article).
  9. Return ScanReport.

Total runtime: 5–15s for typical pages. Designed to run inline inside an HTTP handler — fast enough that the UI can poll-or-await rather than need a job queue.

CRO recommendations

The /scan orchestrator runs croAuditTool alongside SEO and GEO. The CRO tool fetches the page and scores 8 conversion-rate-optimization dimensions (headline clarity, CTA visibility, social proof, urgency, trust signals, form friction, mobile readiness, above-the-fold content) plus an overall 0–100 score that surfaces as report.croScore.

CRO issues flow into the unified checklist with source: 'cro' and route to the cro-specialist agent for "Fix with AI". A CRO score ≥80 emits a Done item ("CRO score 88/100"). The full CRO payload is persisted to ScanResult.rawCroAudit for re-rendering past scans.

CRO is optional: if the audit times out, errors, or returns a malformed payload, the orchestrator logs a warn and returns a scan with croScore: null and rawCroAudit: null. The UI hides the CRO score gauge and CRO checklist items in that case — the SEO + GEO experience is unchanged. This keeps the /scan endpoint a tight 30s SLA even when the CRO upstream misbehaves.

Override the timeout for tests via SCAN_CRO_AUDIT_TIMEOUT_MS (milliseconds; defaults to 20000).

Visual snapshot

Every successful scan captures a 1280×800 above-the-fold screenshot of the URL using Playwright + Chromium and surfaces it on the report card.

PropertyValue
Viewport1280×800 (above-the-fold only — full-page is wasteful and the hero is what matters)
FormatJPEG quality 80 (typical 50–200 KB)
StorageLocal filesystem under SCAN_SCREENSHOT_DIR (default /tmp/scan-screenshots)
Served fromGET /api/scan-screenshots/:filename
FilenameUUID v4 — acts as a capability token (unguessable, no auth required)
LifetimeSingle Fly VM — ephemeral. Re-scan to regenerate.
Persisted on ScanResultscreenshot_url column (TEXT, nullable)

Why local FS and not S3: there is no S3/R2 SDK in this codebase, no bucket configured, and screenshots are trivially regenerated by re-scanning. Screenshots are typically read minutes-to-hours after creation, well within a single VM lifetime. If you need permanent archival, subscribe to webhooks and pull the URL while it's hot.

Failure mode: screenshot capture is fired in parallel with SEO/GEO/CRO and wrapped in .catch(() => null). A Playwright crash, missing Chromium binary, full disk, or a 20s timeout all degrade silently — screenshotUrl is null and the rest of the report ships unchanged. The UI hides the snapshot card when screenshotUrl is null.

Override the capture timeout via SCAN_SCREENSHOT_TIMEOUT_MS (milliseconds; defaults to 20000).

The prioritized checklist

composeChecklist({ seo, geo, cro? }) merges issues from up to three sources into a single list with three buckets:

{
  urgent: ChecklistItem[],
  recommended: ChecklistItem[],
  done: ChecklistItem[],
  all: ChecklistItem[],     // sorted urgent → recommended → done
  counts: { urgent, recommended, done, total }
}

Each ChecklistItem has:

{
  id: string,
  severity: 'urgent' | 'recommended' | 'done',
  area: string,
  source: 'seo' | 'geo' | 'cro',
  title: string,
  detail: string,
  fix: string,
  fixAgentId?: 'seo-writer' | 'site-qa' | 'cro-specialist',
  fixType: 'auto' | 'manual' | 'ai-prompt',
}

Surfacing Done items is what makes the dashboard feel like progress — Base44 explicitly shows a Done count, and we mirror it. Any SEO dimension scoring ≥0.95 contributes a Done item ("Your page passes the meta-tags check").

Fix tiers

fixType controls how the UI presents the fix button:

TierUI behaviorExample
autoOne-click fix; runs without an LLM round-trip."Add <meta charset>" — copy-pasteable code snippet.
manualInline instructions only; nothing to delegate."Compress your hero image to under 200KB."
ai-prompt"Fix with AI" stages a chat thread on the routed agent."Rewrite title to be 30–60 chars" → spawns a chat with seo-writer.

Agent routing

fixAgentId decides where an ai-prompt fix goes:

AgentHandles
seo-writerTitle, meta description, body content, GEO citability rewrites
site-qaRobots.txt, sitemap, canonical, structured data, technical
cro-specialistCRO issues (when CRO audit is included)

POST /api/scan/:customerId/:scanId/fix { checklistItemId } looks up the item in the persisted checklist, creates a ScanFix row pointing at the agent, and stages a chat thread. The customer reviews + confirms before any change ships — same prompt-staged pattern Base44 uses, which preserves user control and routes the audit funnel into billable AI messages.

Auto-generated artifacts

Two copy-pasteable outputs are generated on every scan and stored on the ScanResult row.

llms.txt

Generated via generateLlmsTxt({ websiteUrl, companyName, description, contactEmail }). Brand info is auto-extracted from the HTML: company name comes from og:site_nameog:title<title> (with " | Brand" and " - Brand" suffixes stripped); description from <meta name="description">og:description → first 200 chars of body text; contact email from the first email-shaped string on the page.

JSON-LD schemas

generateAutoJsonLd(html, url, brand) returns an array of { type, jsonLd }:

  • Organization — emitted only when scanning the homepage (pathname === '/').
  • WebPage — emitted always.
  • Article — emitted when og:type === 'article', with auto-extracted datePublished and author.

All schemas use real data extracted from the page — og:title, og:image, og:description — so the customer can paste them straight into the page's <head>.

Schema

model ScanResult {
  id                 String    @id @default(uuid()) @db.Uuid
  customerId         String    @db.Uuid
  url                String
  status             String    @default("pending") // pending | running | done | failed
  seoScore           Int?
  geoScore           Int?
  croScore           Int?       // optional, for landing pages
  citabilityGrade    String?    // A-F
  perEngineScores    Json?      // { chatgpt, perplexity, gemini, googleAi }
  dimensions         Json?
  checklist          Json?
  llmsTxt            String?    @db.Text
  jsonLdSchemas      Json?
  rawCitabilityAudit Json?
  rawCroAudit        Json?
  screenshotUrl      String?
  errorMessage       String?
  startedAt          DateTime?
  completedAt        DateTime?
}

model ScanFix {
  id              String  @id @default(uuid()) @db.Uuid
  scanId          String  @db.Uuid
  customerId      String  @db.Uuid
  checklistItemId String   // matches ScanResult.checklist[].id
  agentId         String   // 'seo-writer' | 'site-qa' | 'cro-specialist'
  fixType         String   // 'auto' | 'manual' | 'ai-prompt'
  status          String  @default("queued")
  prompt          String?  @db.Text
  result          Json?
}

API endpoints

All four endpoints are tenant-scoped under /api/scan/:customerId/...:

MethodPathPurpose
POST/api/scan/:customerIdStart a new scan. Creates the row in running state, runs the orchestrator inline, returns the full ScanReport.
GET/api/scan/:customerIdList recent scans (50 most recent, summary fields only).
GET/api/scan/:customerId/:scanIdFetch a specific scan with full payload.
POST/api/scan/:customerId/:scanId/fixSpawn a "Fix with AI" delegation. Body: { checklistItemId }. Creates a ScanFix row and stages a chat thread on the routed agent.

The scan row is created up-front in running status so the UI can poll on a 1.5s interval until status === 'done' or 'failed'. On unhandled error, the row is marked 'failed' with the error message preserved.

UI at /scan

  1. URL input + "Run Scan" button.
  2. 5–15s wait, then results render:
    • Two gauges side by side — SEO score and GEO score, with the citability grade.
    • Per-engine GEO breakdown — four mini-cards: ChatGPT, Perplexity, Gemini, Google AI Overviews.
    • Filter pills — All / Urgent / Recommended / Done.
    • Checklist — each item has a per-row action button. auto = one-click. manual = inline copy. ai-prompt = "Fix with AI" → routed agent.
    • Copy-pasteable artifactsllms.txt block + JSON-LD blocks, each with a copy-to-clipboard button.

PDF export

Every completed scan can be downloaded as a branded PDF report:

GET /api/scan/:customerId/:scanId/pdf

The endpoint renders the persisted ScanResult row through Playwright/Chromium (already a dep — same browser used for screenshots). Returns Content-Type: application/pdf with Content-Disposition: attachment; filename="heycmo-scan-<host>-<date>.pdf".

What's in the PDF:

  • Branded header (heycmo logo + scan timestamp)
  • Scanned URL
  • Three circular score gauges — SEO + GEO + CRO
  • Per-engine GEO mini-cards — ChatGPT, Perplexity, Gemini, Google AI
  • Above-the-fold screenshot thumbnail (when present)
  • Top 10 checklist items with severity pills (Urgent / Recommended / Done)
  • CRO recommendations (when the CRO audit ran)
  • Footer with timestamp + heycmo branding link

Design choices:

  • No storage. PDFs are rendered on-demand from the canonical ScanResult row each time. Saves storage and lets template tweaks propagate to old scans automatically.
  • Pure HTML generator. renderScanReportHtml() is a pure function with full unit-test coverage. Only renderScanReportPdf() touches Playwright.
  • A4 portrait, 14mm × 12mm margins. Tuned for both screen and print.
  • Graceful degradation. Missing CRO/screenshot/per-engine fields are simply omitted — the rest of the report renders normally.

UI: the /scan page surfaces a Download PDF report button below the score cards. Same-origin GET means the existing httpOnly session cookie authenticates the request — no extra token plumbing.

Source files:

  • apps/api/agent/lib/scan-pdf.ts — HTML template + Playwright wrapper
  • apps/api/infra/server.ts — endpoint (search for /scan/:customerId/:scanId/pdf)

Source files

  • apps/api/agent/lib/seo-geo-scan.ts — the orchestrator (and per-engine score model)
  • apps/api/agent/lib/seo-score.ts — SEO scorer (pure)
  • apps/api/agent/lib/checklist-composer.ts — issue → bucketed checklist
  • apps/api/agent/tools/ai-visibility.tsauditAiCitability (GEO scorer)
  • apps/api/agent/tools/llms-txt-generator.tsllms.txt generator
  • apps/api/infra/server.ts — endpoints (search for /api/scan/)
  • apps/web/src/pages/SeoGeoScanner.tsx — the /scan UI

On this page