Naly Engineering Notes: Retrieval-Augmented Article Writing With Persisted Sources

Abstract

Retrieval-augmented generation gives Naly's article pipeline a research memory that is fresher and more auditable than model weights alone. For each engineering note or market-intelligence article job, the system can search the web and arXiv, keep the source URLs with the generated artifact, ask the model to answer first, and render the result as HTML. The point is not automation for its own sake; it is publishing claims readers can trace.

The thesis is simple: RAG for article writing should be treated as a production evidence system, not as a chatbot pattern. A chatbot can be forgiven for a weak answer; a published article becomes a durable trust surface. Naly's implementation therefore needs three invariants: retrieval before drafting, source records that survive after publication, and a renderer that preserves readable Markdown while avoiding unsafe HTML.

Where it sits in Naly

Naly article jobs sit between research acquisition and public publishing. The job starts with a selected topic, generates search intents, fetches web and arXiv material, normalizes the results into source records, and then asks a model to write an answer-first article from that bounded evidence set. The output is not just prose. It is a bundle: Markdown content, rendered HTML, source URLs, source titles, source kinds, and enough metadata to explain why each source was used.

This matters for Naly's trust loop. Naly's broader editorial posture is to publish what others hide: decision memos, calibration limits, failures, and the evidence behind claims. Source-backed generation makes that posture operational. Readers should not have to guess whether a statement came from a model's training data, an official document, a paper, or an operator assertion.

The RAG layer belongs before drafting, not after it. Post-hoc citation attachment is weaker because the model has already formed claims. In a stronger design, retrieval constrains the generation context, and generation produces claims that can be checked against the retrieved set. The visible article can stay concise, but the stored artifact should retain the research trail.

Technical mechanism

For article writing, Naly's RAG flow is a batch pipeline:

Topic selection creates a bounded research question, such as how retrieval-augmented generation grounds source-backed article writing.
Query planning expands that question into web queries, official-document queries, and arXiv queries.
Retrieval collects official documentation, primary papers, and high-signal explanatory sources.
Normalization extracts title, canonical URL, source kind, publication or update context when available, and relevant snippets or abstracts.
Source persistence stores the URL ledger before generation so the article can be audited later.
Prompt assembly gives the model the topic, Naly-specific implementation facts, writing constraints, and retrieved evidence.
Generation produces Markdown with an answer-first abstract, explicit failure modes, and a references section.
Validation checks that every reference in the rendered article maps to a stored source object.
Rendering converts Markdown to HTML for the site while applying sanitization and production checks.

This is close to the retrieval and augmented-generation pattern described in Vercel's RAG guide: retrieve the relevant context first, then combine it with the user or job question before generation. The difference is that Naly is not optimizing for conversational support. It is optimizing for durable publication, where a source URL is part of the article's data model.

The AI SDK is a natural orchestration layer for this kind of job because its text-generation interface supports non-interactive automation, tool calls, multi-step results, and source metadata when providers return URL sources. Even when a provider does not return native source objects, Naly can attach its own retrieved-source list to the article artifact and treat model-native sources as supplemental rather than authoritative.

What the literature says

The original RAG formulation by Lewis et al. framed the core problem well: parametric language models store facts in weights, but updating that knowledge and providing provenance remain difficult. Their retrieval-augmented model combined a sequence model with a dense vector index and found more specific, diverse, and factual generation than a parametric-only baseline on knowledge-intensive tasks.

The later RAG survey by Gao et al. generalizes that idea into a taxonomy: naive RAG, advanced RAG, and modular RAG. Naly's article pipeline should be modular. Retrieval, ranking, source persistence, prompt construction, generation, reference validation, and rendering are separate units with separate failure modes. Treating them as separate units makes the system easier to debug when an article cites a weak source or misses a better one.

Self-RAG adds an important caution. Asai et al. argue that retrieving a fixed number of passages whether or not retrieval is needed can degrade output quality. For Naly, that means top-k retrieval should not be a ritual. A small article about a stable framework feature may need official docs and one paper; a literature-heavy article may need multiple arXiv sources and a survey. Retrieval depth should follow claim risk.

RAGChecker gives the evaluation lesson. Ru et al. argue that RAG systems need fine-grained diagnostics across both retrieval and generation, especially for long-form responses. For Naly, the unit of evaluation should not be only article quality. It should include retrieval recall, source relevance, claim support, reference coverage, and whether unsupported claims slipped into the final Markdown.

Design trade-offs

High recall versus low noise is the central trade-off. More retrieval improves the chance of finding the right source, but it also increases the chance that weak snippets enter the prompt and steer the model. Naly should prefer a staged approach: broad collection, strict filtering, then compact prompt context.

Source persistence improves auditability, but it also creates storage and maintenance work. URLs drift, papers get revised, and documentation pages move. The durable record should include canonical URL, fetched timestamp, title, source type, and ideally a content digest or excerpt boundary. That lets Naly distinguish a model error from a changed source.

Answer-first writing improves reader value, but it can compress uncertainty too aggressively. The article should lead with the conclusion while preserving a later section for failure modes and caveats. The answer-first summary is the entry point; it is not a license to flatten the evidence.

Rendered HTML improves distribution and reading experience, but it creates a security boundary. Marked is fast and useful for Markdown parsing, but its documentation explicitly warns that it does not sanitize output HTML. A Naly article renderer should sanitize generated HTML and keep the trusted Markdown source available for replay.

Failure modes

Retrieval miss: the search step finds plausible but incomplete sources. This usually happens when the query planner is too narrow or uses product terms that differ from the literature. Mitigation: use multiple query styles, include official docs, and require at least two primary or arXiv sources when the article makes research claims.

Citation laundering: a source appears in the references, but it does not actually support the sentence near it. This is worse than having no citation because it creates false confidence. Mitigation: validate claims against source snippets and reject articles where references are merely topical.

Stale source drift: an official documentation page changes after publication. Mitigation: persist source metadata at generation time and record the date label. For sources that drive major claims, store a snapshot or digest where licensing allows.

Over-retrieval: too many chunks make the model summarize the context rather than answer the article thesis. Mitigation: use source ranking, deduplicate near-identical documents, and cap prompt evidence by claim relevance rather than raw count.

Context poisoning: spam pages, generated SEO pages, or low-quality summaries outrank primary material. Mitigation: rank official documentation, arXiv, standards, and source repositories above secondary commentary unless the article is explicitly about industry reception.

Renderer risk: generated Markdown can include raw HTML, unsafe links, or malformed tables. Mitigation: sanitize rendered HTML, normalize links, reject scriptable content, and run production checks consistent with the Next.js guidance on performance, security, metadata, and accessibility.

Implementation notes

Given Naly's current runtime facts, the clean architecture is a TypeScript job that uses ai@6.0.0-beta.105 for model orchestration, web and arXiv retrieval tools for evidence collection, Drizzle ORM with Neon for article and source records, marked@17.0.1 for Markdown-to-HTML rendering, and Next.js 16 for the publishing surface.

The database should treat sources as first-class rows, not as a blob of Markdown text. A practical schema has an article table, an article-source join table, and source fields for URL, title, source kind, retrieved timestamp, canonical identifier such as arXiv ID when available, and extraction status. The article record can then point to Markdown, rendered HTML, summary, key points, and publication metadata.

Vercel Blob is useful for larger artifacts or immutable render outputs, while Postgres remains better as the queryable ledger for sources and article metadata. That separation keeps audit queries cheap: list every article that used a source, every source used by an article, and every article whose source extraction failed.

The generator prompt should require source discipline in the shape of the output: no unsupported claims, no invented URLs, and a references section whose links must match the persisted source list. The model can write fluid prose, but the job should own source truth. If the model emits a reference that was not retrieved, the validator should fail the article rather than quietly publish it.

Finally, production behavior matters. Next.js already provides server components, code splitting, prerendering, and caching defaults, but generated content pipelines still need explicit error handling, security checks, metadata, and Core Web Vitals awareness. RAG helps Naly write with evidence. Production engineering makes sure that evidence reaches readers quickly, safely, and repeatedly.