SEO + GEO Scanner
One-click dual-score audit for any URL — SEO + Generative Engine Optimization, with per-engine breakdowns, a prioritized checklist, and "Fix with AI" routing.
SEO + GEO Scanner
The Scanner is a one-click audit at /scan. Customer enters a URL, waits 5–15 seconds, and gets back two scores (SEO + GEO), a per-engine breakdown (ChatGPT / Perplexity / Gemini / Google AI), a prioritized checklist (Urgent / Recommended / Done), and copy-pasteable artifacts (llms.txt, JSON-LD).
It is HeyCMO's response to Base44's SEO+GEO scanner, with three deliberate improvements:
- Dual scores stay separate — Base44 markets dual scores but ships a unified number. We don't collapse them.
- Works on any URL — Base44 is locked to Base44-built apps.
- Per-engine GEO breakdown surfaced — fills the gap @HubLensOfficial flagged in the Base44 launch thread.
Configuration
| Property | Value |
|---|---|
| Phase | 6 |
| Schema | ScanResult, ScanFix |
| Orchestrator | apps/api/agent/lib/seo-geo-scan.ts |
| SEO scorer | apps/api/agent/lib/seo-score.ts |
| Checklist composer | apps/api/agent/lib/checklist-composer.ts |
| UI | /scan |
| Run mode | inline (5–15s typical, 30s HTTP timeout) |
Dual-score model
{
seoScore: number, // 0-100, how Google ranks you
geoScore: number, // 0-100, how AI engines cite you
citabilityGrade: 'A' | 'B' | 'C' | 'D' | 'F',
perEngineScores: {
chatgpt: number, // 0-100
perplexity: number,
gemini: number,
googleAi: number,
}
}Score semantics: 80+ green, 60–79 amber, <60 red. Grade is mapped from the GEO score.
SEO scoring (8 dimensions)
computeSeoScore(html, url, ctx) is a pure function. No LLM calls, no I/O — designed to run inline in the orchestrator.
| Dimension | Weight | What's checked |
|---|---|---|
| Meta tags | 25% | <title> length 10–60 chars, <meta name="description"> present and ≤160 chars, charset, <html lang> |
| Open Graph | 15% | og:title, og:description, og:image, og:type |
| Structured data | 15% | At least one JSON-LD <script type="application/ld+json"> |
| Canonical | 10% | Present, https, no parameterized loop |
| Crawlability | 15% | /robots.txt reachable, /sitemap.xml or /sitemap_index.xml reachable |
| Content quality | 10% | H1 exactly once, H2+ structure, word count |
| Image optimization | 5% | Alt text on ≥80% of <img> tags |
| Mobile-friendly | 5% | <meta name="viewport" content="width=device-width"> |
Total = 100. Score is rounded to nearest int.
Each failed check emits a SeoIssue with: stable kebab-case id, area, severity (critical | warning | info | pass), title, detail, fix, optional fixAgentId, and fixType.
GEO scoring (9 dimensions)
GEO scoring is the AI-citability audit (auditAiCitability). The 9 dimensions:
| Dimension | What it measures |
|---|---|
blufPresence | Bottom-Line-Up-Front answers — direct, citable statements at the top of sections |
citationDensity | Inline links to external authoritative sources |
comparisonStructure | Tables, vs-comparisons, side-by-side breakdowns |
statisticsDensity | Concrete numbers, percentages, dated data points |
schemaMarkup | JSON-LD entity coverage (Organization, Product, Article, FAQ) |
contentLength | Token count in the meaningful body |
headingDepth | H1/H2/H3 hierarchy depth and balance |
referenceTone | Encyclopedic, factual phrasing vs marketing copy |
freshness | datePublished / dateModified recency signals |
Per-engine breakdown
Each engine weights the 9 GEO dimensions differently. Weights from public research (Princeton GEO 2024, Foglift category weighting, Base44 launch-thread framing).
chatgpt: { // Bottom-Line-Up-Front + citations dominate
blufPresence: 0.25,
citationDensity: 0.20,
comparisonStructure: 0.15,
statisticsDensity: 0.15,
schemaMarkup: 0.10,
contentLength: 0.10,
headingDepth: 0.05,
}
perplexity: { // Citation-first, freshness matters
citationDensity: 0.30,
freshness: 0.20,
blufPresence: 0.15,
statisticsDensity: 0.15,
referenceTone: 0.10,
schemaMarkup: 0.10,
}
gemini: { // Entity-graph leaning
schemaMarkup: 0.30,
headingDepth: 0.20,
contentLength: 0.15,
comparisonStructure: 0.15,
blufPresence: 0.10,
statisticsDensity: 0.10,
}
googleAi: { // AI Overviews still bias to traditional SEO
schemaMarkup: 0.25,
contentLength: 0.20,
citationDensity: 0.15,
headingDepth: 0.15,
blufPresence: 0.10,
statisticsDensity: 0.10,
freshness: 0.05,
}Each per-engine score is a weighted sum of the 9-dimension scores, normalized to 0–100.
Orchestrator flow
runSeoGeoScan(rawUrl) in apps/api/agent/lib/seo-geo-scan.ts:
validateOutboundUrl(url)— SSRF guard.- Fetch HTML once (15s timeout, follow redirects).
- Probe
/robots.txt+/sitemap.xmlin parallel — context for the SEO scorer's crawlability dimension. - Parallel:
auditAiCitability(url)— GEO score (re-fetches internally; ~7 KB extra is negligible).computeSeoScore(html, url, ctx)— SEO score using the already-fetched HTML.croAuditTool(url)— CRO heuristic audit, optional. Wrapped with a 20s hard timeout (SCAN_CRO_AUDIT_TIMEOUT_MSenv override). On any failure (timeout, fetch error, malformed payload),croScorereturnsnulland the SEO+GEO scan still completes — the CRO panel is silently hidden in the UI.
- Compute per-engine scores from the 9 GEO dimensions.
composeChecklist({ seo, geo, cro })— bucket Urgent / Recommended / Done. CRO issues are merged in only when the optional audit succeeded.generateLlmsTxt(brandInfo)— auto-extracted brand name, description, contact email.- Generate JSON-LD — Organization (homepage only) + WebPage (always) + Article (when
og:type=article). - Return
ScanReport.
Total runtime: 5–15s for typical pages. Designed to run inline inside an HTTP handler — fast enough that the UI can poll-or-await rather than need a job queue.
CRO recommendations
The /scan orchestrator runs croAuditTool alongside SEO and GEO. The CRO tool fetches the page and scores 8 conversion-rate-optimization dimensions (headline clarity, CTA visibility, social proof, urgency, trust signals, form friction, mobile readiness, above-the-fold content) plus an overall 0–100 score that surfaces as report.croScore.
CRO issues flow into the unified checklist with source: 'cro' and route to the cro-specialist agent for "Fix with AI". A CRO score ≥80 emits a Done item ("CRO score 88/100"). The full CRO payload is persisted to ScanResult.rawCroAudit for re-rendering past scans.
CRO is optional: if the audit times out, errors, or returns a malformed payload, the orchestrator logs a warn and returns a scan with croScore: null and rawCroAudit: null. The UI hides the CRO score gauge and CRO checklist items in that case — the SEO + GEO experience is unchanged. This keeps the /scan endpoint a tight 30s SLA even when the CRO upstream misbehaves.
Override the timeout for tests via SCAN_CRO_AUDIT_TIMEOUT_MS (milliseconds; defaults to 20000).
Visual snapshot
Every successful scan captures a 1280×800 above-the-fold screenshot of the URL using Playwright + Chromium and surfaces it on the report card.
| Property | Value |
|---|---|
| Viewport | 1280×800 (above-the-fold only — full-page is wasteful and the hero is what matters) |
| Format | JPEG quality 80 (typical 50–200 KB) |
| Storage | Local filesystem under SCAN_SCREENSHOT_DIR (default /tmp/scan-screenshots) |
| Served from | GET /api/scan-screenshots/:filename |
| Filename | UUID v4 — acts as a capability token (unguessable, no auth required) |
| Lifetime | Single Fly VM — ephemeral. Re-scan to regenerate. |
Persisted on ScanResult | screenshot_url column (TEXT, nullable) |
Why local FS and not S3: there is no S3/R2 SDK in this codebase, no bucket configured, and screenshots are trivially regenerated by re-scanning. Screenshots are typically read minutes-to-hours after creation, well within a single VM lifetime. If you need permanent archival, subscribe to webhooks and pull the URL while it's hot.
Failure mode: screenshot capture is fired in parallel with SEO/GEO/CRO and wrapped in .catch(() => null). A Playwright crash, missing Chromium binary, full disk, or a 20s timeout all degrade silently — screenshotUrl is null and the rest of the report ships unchanged. The UI hides the snapshot card when screenshotUrl is null.
Override the capture timeout via SCAN_SCREENSHOT_TIMEOUT_MS (milliseconds; defaults to 20000).
The prioritized checklist
composeChecklist({ seo, geo, cro? }) merges issues from up to three sources into a single list with three buckets:
{
urgent: ChecklistItem[],
recommended: ChecklistItem[],
done: ChecklistItem[],
all: ChecklistItem[], // sorted urgent → recommended → done
counts: { urgent, recommended, done, total }
}Each ChecklistItem has:
{
id: string,
severity: 'urgent' | 'recommended' | 'done',
area: string,
source: 'seo' | 'geo' | 'cro',
title: string,
detail: string,
fix: string,
fixAgentId?: 'seo-writer' | 'site-qa' | 'cro-specialist',
fixType: 'auto' | 'manual' | 'ai-prompt',
}Surfacing Done items is what makes the dashboard feel like progress — Base44 explicitly shows a Done count, and we mirror it. Any SEO dimension scoring ≥0.95 contributes a Done item ("Your page passes the meta-tags check").
Fix tiers
fixType controls how the UI presents the fix button:
| Tier | UI behavior | Example |
|---|---|---|
auto | One-click fix; runs without an LLM round-trip. | "Add <meta charset>" — copy-pasteable code snippet. |
manual | Inline instructions only; nothing to delegate. | "Compress your hero image to under 200KB." |
ai-prompt | "Fix with AI" stages a chat thread on the routed agent. | "Rewrite title to be 30–60 chars" → spawns a chat with seo-writer. |
Agent routing
fixAgentId decides where an ai-prompt fix goes:
| Agent | Handles |
|---|---|
seo-writer | Title, meta description, body content, GEO citability rewrites |
site-qa | Robots.txt, sitemap, canonical, structured data, technical |
cro-specialist | CRO issues (when CRO audit is included) |
POST /api/scan/:customerId/:scanId/fix { checklistItemId } looks up the item in the persisted checklist, creates a ScanFix row pointing at the agent, and stages a chat thread. The customer reviews + confirms before any change ships — same prompt-staged pattern Base44 uses, which preserves user control and routes the audit funnel into billable AI messages.
Auto-generated artifacts
Two copy-pasteable outputs are generated on every scan and stored on the ScanResult row.
llms.txt
Generated via generateLlmsTxt({ websiteUrl, companyName, description, contactEmail }). Brand info is auto-extracted from the HTML: company name comes from og:site_name → og:title → <title> (with " | Brand" and " - Brand" suffixes stripped); description from <meta name="description"> → og:description → first 200 chars of body text; contact email from the first email-shaped string on the page.
JSON-LD schemas
generateAutoJsonLd(html, url, brand) returns an array of { type, jsonLd }:
- Organization — emitted only when scanning the homepage (
pathname === '/'). - WebPage — emitted always.
- Article — emitted when
og:type === 'article', with auto-extracteddatePublishedandauthor.
All schemas use real data extracted from the page — og:title, og:image, og:description — so the customer can paste them straight into the page's <head>.
Schema
model ScanResult {
id String @id @default(uuid()) @db.Uuid
customerId String @db.Uuid
url String
status String @default("pending") // pending | running | done | failed
seoScore Int?
geoScore Int?
croScore Int? // optional, for landing pages
citabilityGrade String? // A-F
perEngineScores Json? // { chatgpt, perplexity, gemini, googleAi }
dimensions Json?
checklist Json?
llmsTxt String? @db.Text
jsonLdSchemas Json?
rawCitabilityAudit Json?
rawCroAudit Json?
screenshotUrl String?
errorMessage String?
startedAt DateTime?
completedAt DateTime?
}
model ScanFix {
id String @id @default(uuid()) @db.Uuid
scanId String @db.Uuid
customerId String @db.Uuid
checklistItemId String // matches ScanResult.checklist[].id
agentId String // 'seo-writer' | 'site-qa' | 'cro-specialist'
fixType String // 'auto' | 'manual' | 'ai-prompt'
status String @default("queued")
prompt String? @db.Text
result Json?
}API endpoints
All four endpoints are tenant-scoped under /api/scan/:customerId/...:
| Method | Path | Purpose |
|---|---|---|
POST | /api/scan/:customerId | Start a new scan. Creates the row in running state, runs the orchestrator inline, returns the full ScanReport. |
GET | /api/scan/:customerId | List recent scans (50 most recent, summary fields only). |
GET | /api/scan/:customerId/:scanId | Fetch a specific scan with full payload. |
POST | /api/scan/:customerId/:scanId/fix | Spawn a "Fix with AI" delegation. Body: { checklistItemId }. Creates a ScanFix row and stages a chat thread on the routed agent. |
The scan row is created up-front in running status so the UI can poll on a 1.5s interval until status === 'done' or 'failed'. On unhandled error, the row is marked 'failed' with the error message preserved.
UI at /scan
- URL input + "Run Scan" button.
- 5–15s wait, then results render:
- Two gauges side by side — SEO score and GEO score, with the citability grade.
- Per-engine GEO breakdown — four mini-cards: ChatGPT, Perplexity, Gemini, Google AI Overviews.
- Filter pills — All / Urgent / Recommended / Done.
- Checklist — each item has a per-row action button.
auto= one-click.manual= inline copy.ai-prompt= "Fix with AI" → routed agent. - Copy-pasteable artifacts —
llms.txtblock + JSON-LD blocks, each with a copy-to-clipboard button.
PDF export
Every completed scan can be downloaded as a branded PDF report:
GET /api/scan/:customerId/:scanId/pdfThe endpoint renders the persisted ScanResult row through Playwright/Chromium (already a dep — same browser used for screenshots). Returns Content-Type: application/pdf with Content-Disposition: attachment; filename="heycmo-scan-<host>-<date>.pdf".
What's in the PDF:
- Branded header (heycmo logo + scan timestamp)
- Scanned URL
- Three circular score gauges — SEO + GEO + CRO
- Per-engine GEO mini-cards — ChatGPT, Perplexity, Gemini, Google AI
- Above-the-fold screenshot thumbnail (when present)
- Top 10 checklist items with severity pills (Urgent / Recommended / Done)
- CRO recommendations (when the CRO audit ran)
- Footer with timestamp + heycmo branding link
Design choices:
- No storage. PDFs are rendered on-demand from the canonical
ScanResultrow each time. Saves storage and lets template tweaks propagate to old scans automatically. - Pure HTML generator.
renderScanReportHtml()is a pure function with full unit-test coverage. OnlyrenderScanReportPdf()touches Playwright. - A4 portrait, 14mm × 12mm margins. Tuned for both screen and print.
- Graceful degradation. Missing CRO/screenshot/per-engine fields are simply omitted — the rest of the report renders normally.
UI: the /scan page surfaces a Download PDF report button below the score cards. Same-origin GET means the existing httpOnly session cookie authenticates the request — no extra token plumbing.
Source files:
apps/api/agent/lib/scan-pdf.ts— HTML template + Playwright wrapperapps/api/infra/server.ts— endpoint (search for/scan/:customerId/:scanId/pdf)
Source files
apps/api/agent/lib/seo-geo-scan.ts— the orchestrator (and per-engine score model)apps/api/agent/lib/seo-score.ts— SEO scorer (pure)apps/api/agent/lib/checklist-composer.ts— issue → bucketed checklistapps/api/agent/tools/ai-visibility.ts—auditAiCitability(GEO scorer)apps/api/agent/tools/llms-txt-generator.ts—llms.txtgeneratorapps/api/infra/server.ts— endpoints (search for/api/scan/)apps/web/src/pages/SeoGeoScanner.tsx— the/scanUI
LinkedIn Outreach
BYO Chrome extension companion that runs connection requests, DMs, and comments from the customer's own browser — heycmo never holds the LinkedIn cookie.
AI Citation Tracking
Daily monitoring of brand visibility across ChatGPT, Perplexity, Gemini, and Google AI Overviews — with email alerts when citation status changes.