AI Editorial Workflow: Using LLMs Without Becoming a Content Farm

The most dangerous thing about modern AI writing tools is not that they produce bad content. It is that they produce adequate content — quickly, cheaply, and at a scale that makes volume feel like strategy. Every publisher building in 2025 or 2026 faces the same temptation: the tools are there, the marginal cost is near zero, so why not publish more?

We faced that temptation at NovVista. We still face it. The answer we landed on is not a clean moral line against AI — that would be both dishonest and impractical. The answer is a workflow built specifically to extract what AI does well while holding a hard boundary around what it cannot do. Getting that boundary right took longer than I expected, and cost us some rankings along the way before we corrected course.

This is a full account of how our AI editorial workflow actually works, what it cost us to figure out, and what content quality guardrails we now treat as non-negotiable. If you run an independent publication and are trying to solve the same problem, most of what follows should be directly applicable.

The Problem We Were Trying to Solve (And the One We Created)

NovVista launched with a simple thesis: cover the parts of the technology business that larger outlets either miss or flatten into press release summaries. That means original reporting, sourced analysis, and editorial positions that are actually positions — not hedged aggregation dressed up as journalism.

That kind of content is time-consuming to produce. A well-sourced technical analysis might take two to three days of research, writing, and revision before it meets our standard. When AI writing tools became capable enough to produce a structurally coherent 1,500-word article in four minutes, the math looked attractive. More output, faster cadence, same team.

What we discovered, about eight months into experimenting with AI-assisted production, is that Google’s quality systems had gotten significantly better at identifying content that was structurally sound but informationally thin. Pages that read well but added nothing new — no original perspective, no sourced data beyond what every competing article already contained, no evidence that a human being had actually engaged with the topic — were being demoted in favor of content with demonstrable depth signals.

We had not become a content farm in the way the term is usually used. We had not published hundreds of keyword-stuffed articles with no editorial value. We had done something more subtle and, in retrospect, more corrosive: we had let AI draft first and edited second, which meant AI structure and AI priorities were shaping what we published even when a human had read every sentence. The voice was ours on the surface. The judgment was not.

Fixing that required rebuilding the workflow from the source-discovery stage forward, not just adding a heavier editorial pass at the end.

How the Workflow Actually Works Now

The current NovVista workflow has six distinct stages. AI is involved in three of them. Humans make every decision that matters.

Stage 1: Source Discovery and Story Selection

No AI. Full stop. Every story that appears on NovVista starts as a human editorial judgment about what is worth covering. That judgment is informed by RSS feeds, primary sources, industry contacts, reader questions, and — increasingly — signals from our own analytics about what our audience is actually engaging with versus merely clicking.

I mention this explicitly because story selection is where a lot of AI-assisted publications quietly hand over control. If you are using an AI tool to surface story ideas based on keyword volume or trending topics, you have already made the most consequential editorial decision — what to cover — on the basis of traffic signals rather than journalistic judgment. You have, at that point, oriented your publication toward what ranks rather than what matters. Those two things overlap sometimes. They diverge constantly.

Story selection at NovVista is a weekly conversation. We look at what is actually happening in the technology business, what our reporting has surfaced that others are not covering, and what questions our readers are actively asking us. An AI tool might help us research a candidate story once we have identified it. It does not identify stories for us.

Stage 2: Research and Source Compilation

This is the first stage where AI tools enter the workflow, and the role is bounded: research synthesis and source organization, not primary research.

When we have a story to pursue, we use LLMs primarily to accelerate two tasks. First, summarizing large volumes of primary-source material — earnings call transcripts, academic papers, long technical documentation, regulatory filings — so that a human researcher can identify what is relevant before reading in full. Second, helping map the landscape of a topic quickly: what are the main competing perspectives, what have credible sources said previously, what is the publicly established baseline of facts we need to build from.

Neither of those tasks replaces primary research. The AI-generated summary of an earnings transcript still requires the researcher to read the relevant sections in the original. The landscape mapping still requires verification. What changes is the time cost of getting oriented in an unfamiliar topic. A reporter covering a new vertical used to spend three to four hours just building context. That orientation phase now takes under an hour, which means more time for the work that actually produces original reporting: making calls, reading primary sources, identifying the angle that is not already covered.

Stage 3: AI-Assisted Drafting

This is the stage that requires the most careful management, and the one where most AI-assisted publications go wrong.

We use AI to produce a structural draft — a scaffold, not a finished piece. The prompt is specific: given the research notes, sourced quotes, and the editorial angle we have defined, produce a first-pass structure that organizes the material logically. The output is treated as a starting point, not a document to be edited. The working assumption is that the AI draft will be significantly rewritten.

That distinction — scaffold versus draft — changes the relationship between the human writer and the AI output in an important way. When you edit an AI draft, you are correcting a document. When you use an AI scaffold, you are filling a structure with your own content. The second process produces original writing. The first produces refined AI writing, which is a different thing.

The scaffold approach also has a practical quality effect: it forces the human writer to engage with the structure actively rather than passively accepting the AI’s organizational choices. AI language models have strong priors toward certain structural patterns — broad introduction, three main sections, summary conclusion — that are optimized for general coherence but not for the specific argumentative logic of a particular story. Our writers regularly reject or significantly rearrange the scaffold before filling it in.

Stage 4: Human Editorial Review

Every piece that goes through the AI-assisted process is reviewed by an editor who did not write it, using a structured quality scoring rubric. The review is not a light pass. It typically takes 45 minutes to an hour for a 2,000-word article.

The review process is described in more detail in the section on guardrails below. The core point is that this stage is where editorial judgment operates without AI involvement. Does this piece say something that could not be assembled from existing sources? Is the argument actually made, or implied? Are the claims sourced to the level our standards require? Does this sound like NovVista, or does it sound like a well-organized language model?

That last question — does it sound like us? — is harder to operationalize than it sounds, and we have spent considerable time on it. Voice is not just style. It is the accumulated weight of editorial decisions made consistently over time: what gets hedged and what gets stated plainly, which framings we use, what we are willing to say in print about specific companies or products. AI cannot replicate that because it does not have access to our decision-making history. It can approximate it. The editor’s job is to catch the approximations.

Stage 5: Quality Scoring

Before any piece is approved for publication, it is scored against a five-dimension rubric. Each dimension is scored on a five-point scale, giving a maximum score of 25. We do not publish anything below 18.

The five dimensions are:

Originality: Does this piece contain reporting, analysis, or a perspective that is not replicable by reading the sources we link to? A score of 5 requires something that could only come from this publication — a sourced interview, a documented case study, a data synthesis that did not previously exist. A score of 1 is repackaged content with no original contribution.
Factual precision: Are all factual claims sourced to a primary or credible secondary source, and are the sources cited? This dimension penalizes vague attribution (“experts say,” “studies show”) and hedged claims that gesture at evidence without providing it.
Voice consistency: Does the piece read as NovVista? This is the most subjective dimension, but it is also the most important check against AI-mediated homogenization. Editors score this based on their accumulated sense of our editorial character, reinforced by a document of specific voice markers we maintain and update quarterly.
Structural integrity: Does the argument proceed logically? Are the sections weighted appropriately to the claims being made? Are there structural gaps — claims introduced and not supported, threads opened and abandoned?
Reader value: Would a reader in our target audience finish this piece knowing something they did not know before, or with a clearer framework for thinking about the topic? This is the quality dimension most directly tied to engagement signals, and it is the one most frequently underscored in AI-assisted drafts that have not been sufficiently rewritten.

The 18-point threshold was set after running the rubric retroactively against our six months of pre-rubric content and identifying the score range that correlated with our strongest-performing pieces. It is not arbitrary, but it is also not magic. We revisit the threshold quarterly.

Stage 6: Publication and Content Audit

After publication, every piece enters a 90-day observation window. We track engagement depth (scroll depth, time on page, return visits) alongside search performance. Pieces that underperform on engagement metrics despite adequate search visibility are flagged for content audit.

The audit process is deliberately uncomfortable. It asks: if this piece is ranking but not engaging, what does that tell us about the gap between what we optimized for and what our readers actually needed? The answer is almost always that we optimized too much for coverage breadth at the expense of depth, or that the AI-assisted draft retained structural patterns that were coherent but not compelling.

We have updated or substantially rewritten 22 pieces in the past year as a result of this audit process. In roughly half of those cases, the original piece scored above 18 on our rubric but failed on engagement. That discrepancy has been instructive: a piece can satisfy our editorial quality criteria and still not deliver genuine reader value if the structural choices are correct but the content inside the structure is thin. The rubric has been updated twice in response.

What AI Does Well in Our Process

Having spent considerable space on what AI cannot do, it is worth being specific about where it genuinely earns its place in the workflow.

Research synthesis at speed. When covering a technical topic that spans multiple disciplines or requires understanding the context of a regulatory or financial document, AI tools dramatically reduce the time cost of getting to a usable baseline. This is the clearest and most consistent value we extract.

Structural pressure-testing. Asking an LLM to outline what a comprehensive treatment of a topic would include is a useful check on whether our own planned structure is missing important angles. We do not use the output as an outline. We use it as a comparison — what is in here that we did not plan to cover, and are we making a deliberate editorial choice to omit it?

Translation and localization support. For pieces that will be adapted for other language markets, AI translation tools have eliminated the need to outsource a first-pass translation. Human review of AI translation output is still required, but the process is now fast enough to be practical within a standard editorial cycle.

Headline and metadata generation. AI tools are consistently useful for generating variant headlines, meta descriptions, and subheadings to test against. The quality bar for this output is lower — we are selecting from options rather than publishing output — and the volume of variants an LLM can produce in seconds would take a human copywriter an hour.

What AI Cannot Do in Our Process

The clearest failure modes we have encountered are not technical limitations of the models. They are category errors about what kind of task is being assigned.

AI cannot decide what is worth covering. That is an editorial judgment that requires understanding of our publication’s mission, our audience’s actual needs, and a sense of what has not yet been said. An LLM optimizes for coherence and relevance given a prompt. It does not have the perspective to evaluate whether the prompt itself is the right question.

AI cannot verify facts. It can produce citations. It can produce confident-sounding claims. It cannot distinguish between what a source actually says and what a plausible continuation of that source would say. Every factual claim in a NovVista piece is verified against the original source by a human. This is not optional, and it is not a minor point — it is the foundation of the trust our readers place in us.

AI cannot maintain editorial voice at the level of judgment, only at the level of style. A sufficiently prompted LLM can approximate our tone, our sentence structure, and our paragraph rhythm. It cannot replicate the accumulated decisions about what NovVista does and does not say, which companies we hold to which standards, which claims require more evidence than others. Those decisions live in people, not in any model we have access to.

AI cannot decide when to hedge and when to state plainly. This sounds minor. In practice, it is one of the most consistent editorial failures we see in AI-assisted drafts: claims that should be stated as fact are softened, conclusions that our reporting supports are presented as possibilities, and the cumulative effect is a piece that is technically accurate but intellectually cautious to the point of uselessness. That caution is trained into the models. Removing it from a draft requires editorial judgment about what our reporting actually established.

What the Transparency Itself Accomplishes

We debated internally for some time about whether to publish our AI workflow openly. The concern was that transparency about AI use would invite skepticism about our editorial independence. The conclusion we reached — and have since confirmed through reader feedback — is that the opposite is true.

Readers are sophisticated about AI use in media. Most of them assume that publications are using AI tools and are not disclosing it. Explicit transparency about how and where AI is used, combined with an honest account of the guardrails in place, signals the opposite of what you might expect: it signals that the editorial decisions are being made by people who take the question seriously enough to think about it in public.

The publications that are losing reader trust are not the ones that disclose AI use. They are the ones that have visibly abandoned editorial judgment at the story-selection and voice levels — where every piece is structurally competent, topically current, and indistinguishable from a hundred similar pieces. The tell is not AI-assisted sentences. The tell is the absence of a perspective that only this publication, with its specific history and values, could have taken.

Publishing this workflow is both a transparency statement and a commitment device. It makes our own standards public and creates accountability to them. That accountability is not costless. But the cost of not having it — gradual drift toward content-farm economics under the pressure of publishing velocity — is higher.

What We Would Recommend to Other Publishers

The following is based on what we got wrong before we got it right, and on conversations with other independent publishers who have been working through the same problems.

Define the boundary before you need it. The pressure to use AI for more of the workflow builds gradually. By the time the problem is visible in your metrics, the drift has been happening for months. Decide in advance which decisions are human-only — story selection, fact verification, editorial voice — and build the process around protecting those decisions.
Score your content before it publishes, not just after. A quality rubric applied retroactively to underperforming content is useful. A rubric applied before publication is much more useful. The act of scoring a piece before publication forces the editorial conversation that AI-assisted production tends to skip.
Treat the scaffold as a scaffold. If your writers are editing AI drafts rather than writing into AI structures, you have handed editorial authority to the model. The distinction between the two behaviors looks small in practice and is large in its effects on output quality.
Audit your content regularly and with specific questions. Which pieces rank but do not engage? Which pieces get shared but do not retain readers? What does each gap tell you about where editorial judgment was deferred to process? The answers to those questions are the most reliable inputs to improving your workflow.
Publish the workflow. The transparency cost is lower than it feels. The accountability benefit is higher than it looks. And the readers who care enough about this question to read a disclosure are almost certainly the readers whose long-term engagement you most want to earn.

The model for using AI well in editorial is not so different from the model for using any powerful tool well: understand what it is actually good at, build discipline around the things it cannot do, and build accountability structures that protect against the drift that happens when the tool is used outside its appropriate scope. That is harder than publishing a policy and easier than rebuilding credibility after you have lost it.

We are still iterating. The workflow described here is the current version, not the final one. If you are working on the same problems, we would like to hear what you are finding — reach out through the contact page or the newsletter.

Building an AI-Augmented Editorial Workflow: How We Use LLMs Without Becoming a Content Farm

ByMichael Sun

The Problem We Were Trying to Solve (And the One We Created)

How the Workflow Actually Works Now

Stage 1: Source Discovery and Story Selection

Stage 2: Research and Source Compilation

Stage 3: AI-Assisted Drafting

Stage 4: Human Editorial Review

Stage 5: Quality Scoring

Stage 6: Publication and Content Audit

What AI Does Well in Our Process

What AI Cannot Do in Our Process

What the Transparency Itself Accomplishes

What We Would Recommend to Other Publishers

By Michael Sun

Related Post

The State of AI Code Review in 2026: Tools, Limits, and Workflow Integration

What GPT-5.4 Means for Developers: A Practical Assessment Beyond the Hype

Why Most AI Tool Comparisons Are Useless — and How to Actually Evaluate Them

Leave a Reply Cancel reply

You missed

Zero-Downtime Deployments: Blue-Green, Canary, and Rolling Updates Explained

Building Accessible Web Applications: Beyond Checkbox Compliance

Infrastructure as Code for Solo Developers: Terraform, Pulumi, and When a Shell Script Is Enough

SQLite in Production: When the Simplest Database Is the Right One