One-click dual-score audit for any URL — SEO + Generative Engine Optimization, with per-engine breakdowns, a prioritized checklist, and "Fix with AI" routing.

SEO + GEO Scanner

The Scanner is a one-click audit at /scan. Customer enters a URL, waits 5–15 seconds, and gets back two scores (SEO + GEO), a per-engine breakdown (ChatGPT / Perplexity / Gemini / Google AI), a prioritized checklist (Urgent / Recommended / Done), and copy-pasteable artifacts (llms.txt, JSON-LD).

It is HeyCMO's response to Base44's SEO+GEO scanner, with three deliberate improvements:

Dual scores stay separate — Base44 markets dual scores but ships a unified number. We don't collapse them.
Works on any URL — Base44 is locked to Base44-built apps.
Per-engine GEO breakdown surfaced — fills the gap @HubLensOfficial flagged in the Base44 launch thread.

Configuration

Property	Value
Phase	6
Schema	`ScanResult`, `ScanFix`
Orchestrator	`apps/api/agent/lib/seo-geo-scan.ts`
SEO scorer	`apps/api/agent/lib/seo-score.ts`
Checklist composer	`apps/api/agent/lib/checklist-composer.ts`
UI	`/scan`
Run mode	inline (5–15s typical, 30s HTTP timeout)

Dual-score model

{
  seoScore: number,        // 0-100, how Google ranks you
  geoScore: number,        // 0-100, how AI engines cite you
  citabilityGrade: 'A' | 'B' | 'C' | 'D' | 'F',
  perEngineScores: {
    chatgpt: number,       // 0-100
    perplexity: number,
    gemini: number,
    googleAi: number,
  }
}

Score semantics: 80+ green, 60–79 amber, <60 red. Grade is mapped from the GEO score.

SEO scoring (8 dimensions)

computeSeoScore(html, url, ctx) is a pure function. No LLM calls, no I/O — designed to run inline in the orchestrator.

Dimension	Weight	What's checked
Meta tags	25%	`<title>` length 10–60 chars, `<meta name="description">` present and ≤160 chars, charset, `<html lang>`
Open Graph	15%	`og:title`, `og:description`, `og:image`, `og:type`
Structured data	15%	At least one JSON-LD `<script type="application/ld+json">`
Canonical	10%	Present, https, no parameterized loop
Crawlability	15%	`/robots.txt` reachable, `/sitemap.xml` or `/sitemap_index.xml` reachable
Content quality	10%	H1 exactly once, H2+ structure, word count
Image optimization	5%	Alt text on ≥80% of `<img>` tags
Mobile-friendly	5%	`<meta name="viewport" content="width=device-width">`

Total = 100. Score is rounded to nearest int.

Each failed check emits a SeoIssue with: stable kebab-case id, area, severity (critical | warning | info | pass), title, detail, fix, optional fixAgentId, and fixType.

GEO scoring (9 dimensions)

GEO scoring is the AI-citability audit (auditAiCitability). The 9 dimensions:

Dimension	What it measures
`blufPresence`	Bottom-Line-Up-Front answers — direct, citable statements at the top of sections
`citationDensity`	Inline links to external authoritative sources
`comparisonStructure`	Tables, vs-comparisons, side-by-side breakdowns
`statisticsDensity`	Concrete numbers, percentages, dated data points
`schemaMarkup`	JSON-LD entity coverage (Organization, Product, Article, FAQ)
`contentLength`	Token count in the meaningful body
`headingDepth`	H1/H2/H3 hierarchy depth and balance
`referenceTone`	Encyclopedic, factual phrasing vs marketing copy
`freshness`	`datePublished` / `dateModified` recency signals

Per-engine breakdown

Each engine weights the 9 GEO dimensions differently. Weights from public research (Princeton GEO 2024, Foglift category weighting, Base44 launch-thread framing).

chatgpt: {                 // Bottom-Line-Up-Front + citations dominate
  blufPresence: 0.25,
  citationDensity: 0.20,
  comparisonStructure: 0.15,
  statisticsDensity: 0.15,
  schemaMarkup: 0.10,
  contentLength: 0.10,
  headingDepth: 0.05,
}

perplexity: {              // Citation-first, freshness matters
  citationDensity: 0.30,
  freshness: 0.20,
  blufPresence: 0.15,
  statisticsDensity: 0.15,
  referenceTone: 0.10,
  schemaMarkup: 0.10,
}

gemini: {                  // Entity-graph leaning
  schemaMarkup: 0.30,
  headingDepth: 0.20,
  contentLength: 0.15,
  comparisonStructure: 0.15,
  blufPresence: 0.10,
  statisticsDensity: 0.10,
}

googleAi: {                // AI Overviews still bias to traditional SEO
  schemaMarkup: 0.25,
  contentLength: 0.20,
  citationDensity: 0.15,
  headingDepth: 0.15,
  blufPresence: 0.10,
  statisticsDensity: 0.10,
  freshness: 0.05,
}

Each per-engine score is a weighted sum of the 9-dimension scores, normalized to 0–100.

Orchestrator flow

runSeoGeoScan(rawUrl) in apps/api/agent/lib/seo-geo-scan.ts:

validateOutboundUrl(url) — SSRF guard.
Fetch HTML once (15s timeout, follow redirects).
Probe /robots.txt + /sitemap.xml in parallel — context for the SEO scorer's crawlability dimension.
Parallel:
- auditAiCitability(url) — GEO score (re-fetches internally; ~7 KB extra is negligible).
- computeSeoScore(html, url, ctx) — SEO score using the already-fetched HTML.
- croAuditTool(url) — CRO heuristic audit, optional. Wrapped with a 20s hard timeout (SCAN_CRO_AUDIT_TIMEOUT_MS env override). On any failure (timeout, fetch error, malformed payload), croScore returns null and the SEO+GEO scan still completes — the CRO panel is silently hidden in the UI.
Compute per-engine scores from the 9 GEO dimensions.
composeChecklist({ seo, geo, cro }) — bucket Urgent / Recommended / Done. CRO issues are merged in only when the optional audit succeeded.
generateLlmsTxt(brandInfo) — auto-extracted brand name, description, contact email.
Generate JSON-LD — Organization (homepage only) + WebPage (always) + Article (when og:type=article).
Return ScanReport.

Total runtime: 5–15s for typical pages. Designed to run inline inside an HTTP handler — fast enough that the UI can poll-or-await rather than need a job queue.

CRO recommendations

The /scan orchestrator runs croAuditTool alongside SEO and GEO. The CRO tool fetches the page and scores 8 conversion-rate-optimization dimensions (headline clarity, CTA visibility, social proof, urgency, trust signals, form friction, mobile readiness, above-the-fold content) plus an overall 0–100 score that surfaces as report.croScore.

CRO issues flow into the unified checklist with source: 'cro' and route to the cro-specialist agent for "Fix with AI". A CRO score ≥80 emits a Done item ("CRO score 88/100"). The full CRO payload is persisted to ScanResult.rawCroAudit for re-rendering past scans.

CRO is optional: if the audit times out, errors, or returns a malformed payload, the orchestrator logs a warn and returns a scan with croScore: null and rawCroAudit: null. The UI hides the CRO score gauge and CRO checklist items in that case — the SEO + GEO experience is unchanged. This keeps the /scan endpoint a tight 30s SLA even when the CRO upstream misbehaves.

Override the timeout for tests via SCAN_CRO_AUDIT_TIMEOUT_MS (milliseconds; defaults to 20000).

Visual snapshot

Every successful scan captures a 1280×800 above-the-fold screenshot of the URL using Playwright + Chromium and surfaces it on the report card.

Property	Value
Viewport	1280×800 (above-the-fold only — full-page is wasteful and the hero is what matters)
Format	JPEG quality 80 (typical 50–200 KB)
Storage	Local filesystem under `SCAN_SCREENSHOT_DIR` (default `/tmp/scan-screenshots`)
Served from	`GET /api/scan-screenshots/:filename`
Filename	UUID v4 — acts as a capability token (unguessable, no auth required)
Lifetime	Single Fly VM — ephemeral. Re-scan to regenerate.
Persisted on `ScanResult`	`screenshot_url` column (TEXT, nullable)

Why local FS and not S3: there is no S3/R2 SDK in this codebase, no bucket configured, and screenshots are trivially regenerated by re-scanning. Screenshots are typically read minutes-to-hours after creation, well within a single VM lifetime. If you need permanent archival, subscribe to webhooks and pull the URL while it's hot.

Failure mode: screenshot capture is fired in parallel with SEO/GEO/CRO and wrapped in .catch(() => null). A Playwright crash, missing Chromium binary, full disk, or a 20s timeout all degrade silently — screenshotUrl is null and the rest of the report ships unchanged. The UI hides the snapshot card when screenshotUrl is null.

Override the capture timeout via SCAN_SCREENSHOT_TIMEOUT_MS (milliseconds; defaults to 20000).

The prioritized checklist

composeChecklist({ seo, geo, cro? }) merges issues from up to three sources into a single list with three buckets:

{
  urgent: ChecklistItem[],
  recommended: ChecklistItem[],
  done: ChecklistItem[],
  all: ChecklistItem[],     // sorted urgent → recommended → done
  counts: { urgent, recommended, done, total }
}

Each ChecklistItem has:

{
  id: string,
  severity: 'urgent' | 'recommended' | 'done',
  area: string,
  source: 'seo' | 'geo' | 'cro',
  title: string,
  detail: string,
  fix: string,
  fixAgentId?: 'seo-writer' | 'site-qa' | 'cro-specialist',
  fixType: 'auto' | 'manual' | 'ai-prompt',
}

Surfacing Done items is what makes the dashboard feel like progress — Base44 explicitly shows a Done count, and we mirror it. Any SEO dimension scoring ≥0.95 contributes a Done item ("Your page passes the meta-tags check").

Fix tiers

fixType controls how the UI presents the fix button:

Tier	UI behavior	Example
`auto`	One-click fix; runs without an LLM round-trip.	"Add `<meta charset>`" — copy-pasteable code snippet.
`manual`	Inline instructions only; nothing to delegate.	"Compress your hero image to under 200KB."
`ai-prompt`	"Fix with AI" stages a chat thread on the routed agent.	"Rewrite title to be 30–60 chars" → spawns a chat with `seo-writer`.

Agent routing

fixAgentId decides where an ai-prompt fix goes:

Agent	Handles
`seo-writer`	Title, meta description, body content, GEO citability rewrites
`site-qa`	Robots.txt, sitemap, canonical, structured data, technical
`cro-specialist`	CRO issues (when CRO audit is included)

POST /api/scan/:customerId/:scanId/fix { checklistItemId } looks up the item in the persisted checklist, creates a ScanFix row pointing at the agent, and stages a chat thread. The customer reviews + confirms before any change ships — same prompt-staged pattern Base44 uses, which preserves user control and routes the audit funnel into billable AI messages.

Auto-generated artifacts

Two copy-pasteable outputs are generated on every scan and stored on the ScanResult row.

`llms.txt`

Generated via generateLlmsTxt({ websiteUrl, companyName, description, contactEmail }). Brand info is auto-extracted from the HTML: company name comes from og:site_name → og:title → <title> (with " | Brand" and " - Brand" suffixes stripped); description from <meta name="description"> → og:description → first 200 chars of body text; contact email from the first email-shaped string on the page.

JSON-LD schemas

generateAutoJsonLd(html, url, brand) returns an array of { type, jsonLd }:

Organization — emitted only when scanning the homepage (pathname === '/').
WebPage — emitted always.
Article — emitted when og:type === 'article', with auto-extracted datePublished and author.

All schemas use real data extracted from the page — og:title, og:image, og:description — so the customer can paste them straight into the page's <head>.

Schema

model ScanResult {
  id                 String    @id @default(uuid()) @db.Uuid
  customerId         String    @db.Uuid
  url                String
  status             String    @default("pending") // pending | running | done | failed
  seoScore           Int?
  geoScore           Int?
  croScore           Int?       // optional, for landing pages
  citabilityGrade    String?    // A-F
  perEngineScores    Json?      // { chatgpt, perplexity, gemini, googleAi }
  dimensions         Json?
  checklist          Json?
  llmsTxt            String?    @db.Text
  jsonLdSchemas      Json?
  rawCitabilityAudit Json?
  rawCroAudit        Json?
  screenshotUrl      String?
  errorMessage       String?
  startedAt          DateTime?
  completedAt        DateTime?
}

model ScanFix {
  id              String  @id @default(uuid()) @db.Uuid
  scanId          String  @db.Uuid
  customerId      String  @db.Uuid
  checklistItemId String   // matches ScanResult.checklist[].id
  agentId         String   // 'seo-writer' | 'site-qa' | 'cro-specialist'
  fixType         String   // 'auto' | 'manual' | 'ai-prompt'
  status          String  @default("queued")
  prompt          String?  @db.Text
  result          Json?
}

API endpoints

All four endpoints are tenant-scoped under /api/scan/:customerId/...:

Method	Path	Purpose
`POST`	`/api/scan/:customerId`	Start a new scan. Creates the row in `running` state, runs the orchestrator inline, returns the full `ScanReport`.
`GET`	`/api/scan/:customerId`	List recent scans (50 most recent, summary fields only).
`GET`	`/api/scan/:customerId/:scanId`	Fetch a specific scan with full payload.
`POST`	`/api/scan/:customerId/:scanId/fix`	Spawn a "Fix with AI" delegation. Body: `{ checklistItemId }`. Creates a `ScanFix` row and stages a chat thread on the routed agent.

The scan row is created up-front in running status so the UI can poll on a 1.5s interval until status === 'done' or 'failed'. On unhandled error, the row is marked 'failed' with the error message preserved.

UI at `/scan`

URL input + "Run Scan" button.
5–15s wait, then results render:
- Two gauges side by side — SEO score and GEO score, with the citability grade.
- Per-engine GEO breakdown — four mini-cards: ChatGPT, Perplexity, Gemini, Google AI Overviews.
- Filter pills — All / Urgent / Recommended / Done.
- Checklist — each item has a per-row action button. auto = one-click. manual = inline copy. ai-prompt = "Fix with AI" → routed agent.
- Copy-pasteable artifacts — llms.txt block + JSON-LD blocks, each with a copy-to-clipboard button.

PDF export

Every completed scan can be downloaded as a branded PDF report:

GET /api/scan/:customerId/:scanId/pdf

The endpoint renders the persisted ScanResult row through Playwright/Chromium (already a dep — same browser used for screenshots). Returns Content-Type: application/pdf with Content-Disposition: attachment; filename="heycmo-scan-<host>-<date>.pdf".

What's in the PDF:

Branded header (heycmo logo + scan timestamp)
Scanned URL
Three circular score gauges — SEO + GEO + CRO
Per-engine GEO mini-cards — ChatGPT, Perplexity, Gemini, Google AI
Above-the-fold screenshot thumbnail (when present)
Top 10 checklist items with severity pills (Urgent / Recommended / Done)
CRO recommendations (when the CRO audit ran)
Footer with timestamp + heycmo branding link

Design choices:

No storage. PDFs are rendered on-demand from the canonical ScanResult row each time. Saves storage and lets template tweaks propagate to old scans automatically.
Pure HTML generator. renderScanReportHtml() is a pure function with full unit-test coverage. Only renderScanReportPdf() touches Playwright.
A4 portrait, 14mm × 12mm margins. Tuned for both screen and print.
Graceful degradation. Missing CRO/screenshot/per-engine fields are simply omitted — the rest of the report renders normally.

UI: the /scan page surfaces a Download PDF report button below the score cards. Same-origin GET means the existing httpOnly session cookie authenticates the request — no extra token plumbing.

Source files:

apps/api/agent/lib/scan-pdf.ts — HTML template + Playwright wrapper
apps/api/infra/server.ts — endpoint (search for /scan/:customerId/:scanId/pdf)

Source files

apps/api/agent/lib/seo-geo-scan.ts — the orchestrator (and per-engine score model)
apps/api/agent/lib/seo-score.ts — SEO scorer (pure)
apps/api/agent/lib/checklist-composer.ts — issue → bucketed checklist
apps/api/agent/tools/ai-visibility.ts — auditAiCitability (GEO scorer)
apps/api/agent/tools/llms-txt-generator.ts — llms.txt generator
apps/api/infra/server.ts — endpoints (search for /api/scan/)
apps/web/src/pages/SeoGeoScanner.tsx — the /scan UI

SEO + GEO Scanner

On this page