Naly Engineering Notes: Source-First RAG Article Drafting for Persistent, Auditable Publishing

TL;DRRetrieval-augmented generation (RAG) turns Naly’s article pipeline into a source-grounded publishing system instead of model-memory composition. Every draft request first gathers web and arXiv evidence, normalizes and persists source URLs, and then asks the model to produce an answer-first draft and final HTML article. This shifts risk from "can the model hallucinate?" to "is the retrieval layer complete and traceable," giving editors stable artifacts, replayable jobs, and defensible public claims.

Abstract

RAG in Naly should be designed around source persistence and deterministic contracts. On June 27, 2026, practical reliability comes less from a bigger model and more from whether retrieval artifacts are queryable, versioned, and validated before publication. This note proposes a dual-plane design: an evidence plane for retrieval/storage and a generation plane for drafting, then argues how this architecture directly improves editorial trust and incident handling.

Where it sits in Naly

Naly runs this as a production content subsystem inside a Next.js 16.0.7 App Router stack (next + react), where article publication is part of runtime code paths rather than a separate offline write-up step. The article-job path is where all constraints must be enforced: a job is not "written" until source records exist, summary structure validates, and HTML is materialized.

The stack alignment is intentional:

next@16.0.7 + React Server Components host job-triggered rendering in server space, matching server-side output contracts for articles.
drizzle-orm@0.44.7 + @neondatabase/serverless@1.0.2 define typed, persistent job and source tables so every claim can be traced.
ai@6.0.0-beta.105 provides generation with schema-aware output controls.
marked@17.0.1 converts generated Markdown summaries into rendered HTML for publication.
@vercel/blob@2.0.0 stores generated assets as durable URLs for reuse.
Anthropic tooling can be added as an alternate model provider inside the same contract envelope, but not as an escape hatch from structured constraints.

This replaces a “generate then proofread” model with a grounded write loop: retrieval, validation, generation, rendering, and publish must all pass before the article is visible.

Technical mechanism

A robust Naly design has five bounded stages:

Evidence plan and query orchestration

Define the job spec with topic, date window, and evidence policy.
Run both web search and arXiv search for primary sources.
Deduplicate by canonical URL and normalize protocol, host, and query string.

Source persistence layer

Store per-URL metadata (url, canonicalized URL, fetch status, fetch timestamp, title, excerpt, source type).
Store model-facing snippets separately from raw payloads so re-runs are deterministic even if upstream pages shift.
Add per-source checksums to detect drift.

Context shaping and constraints

Build a retrieval context ordered by relevance and recency.
Require explicit source IDs in the prompt contract.
Force answer-first output shape (intro claim, evidence bullets, risk caveats, uncertainty), plus ordered source references.

Structured generation with strict schema

Use structured output so malformed or schema-violating responses fail fast and are retried with tighter context.
Keep generation in server context and reject drafts that claim unsupported facts without mapped source IDs.

Render, publish, and verify

Convert validated markdown to HTML and persist both markdown + HTML.
Cache final output only after successful validation.
Emit analytics and audit fields: source count, rejected claims, retry count, and generation latency.

The most important design move is the separation of concerns: retrieval quality and generation quality are different failure domains with different metrics. Next.js Server Components fit this split because rendering can stay deterministic while retrieval and generation happen in controlled async tasks.

What the literature says

Recent RAG literature supports this architecture pattern. A 2024 survey of RAG architecture describes how retrieval-augmented systems reduce fact drift by conditioning generation on external evidence, but notes trade-offs in pipeline complexity and modular coordination [Gupta et al., 2024]. A 2025 follow-up survey adds that robustness now depends on adaptive retrieval, decoding control, and end-to-end evaluation, rather than on generation quality alone [Sharma, 2025].

For production quality control, the 2025 evaluation-focused survey explicitly splits assessment into internal retriever/generator metrics and external system metrics; that decomposition is especially useful for article pipelines because “bad article” can mean wrong source choice even when language quality is high [Gan et al., 2025]. Groundedness-specific work has also moved toward detection layers that classify claim support using retrieved context and NLI-style checks, reinforcing the practical value of post-generation validation [Gerner et al., 2025].

In short, the papers converge on one thesis: RAG is not just a way to inject context, it is an engineering contract between retrieval and generation. Naly should therefore optimize the contract, not just the prompt.

Design trade-offs

Freshness vs determinism: stricter TTLs reduce staleness but increase re-fetch cost. Persisting snapshots lets you keep deterministic rendering while still revalidating freshness windows.
Recall vs precision in retrieval: wider retrieval can increase coverage but injects noise; a second-stage relevance filter protects claim quality.
Schema strictness vs prose fluency: strict output schemas improve machine reliability but can reduce stylistic freedom. The answer-first schema pattern preserves readability while keeping guardrails.
Static rendering speed vs auditability: pre-rendered HTML improves delivery performance and reduces repeated compute, but only if the source artifacts used are immutable references.
Complexity vs operations cost: every added validation step (source checks, schema checks, artifact persistence) adds latency. Next production guidance on caching, route boundaries, and build-time verification is important to keep this operable.

Failure modes

Source drift: URLs return 404/soft-changes after job creation. Mitigation: canonical key + snapshot hash + fallback source chain.
Retrieval overreach: high recall with low precision causes plausible but unsupported synthesis. Mitigation: require evidence-first constraints and block claims without source matches.
Model formatting failure: schema mismatch or truncated JSON from generation. Mitigation: strict schema validation and retry with reduced context.
Double-publish races: concurrent workers can publish partial artifacts. Mitigation: job idempotency keys, row-level state transitions (pending -> drafting -> validated -> published).
Rendering regressions: malformed markdown or unsafe HTML transforms. Mitigation: deterministic marked conversion path and HTML output tests tied to sample manifests.
Cache illusions: stale dynamic data in server output can desync published text and source index. Mitigation: align route rendering strategy with explicit runtime freshness policy and avoid implicit caches where evidence freshness is required.

Implementation notes

For a practical rollout, treat this as a publication contract contract:

Define source tables in Drizzle with explicit constraints: URL uniqueness by canonical host/path, fetch status enums, and checksum columns.
Use a Neon-compatible driver path consistently with serverless execution behavior; the Drizzle docs describe both runtime-specific and neon-* driver options.
In generation, enforce structured output contracts and fail on invalid objects before rendering.
Use Next.js production guidance for server boundaries, error pages, caching, and SEO metadata for article routes so publishing remains observable and fast.
Persist generated blobs (e.g., cover images, attachments, exports) through Vercel Blob with explicit access policy and deterministic naming to avoid collisions.
Emit operational checks before publish: minimum source count, minimum source diversity, evidence freshness, and minimum completion rate for mapped claims.

This is the key shift: the article is no longer judged by model cleverness; it is judged by whether evidence and generation stay synchronized under retries and redeploys.