Local-First Software: CRDTs, Sync Engines, and Why the Cloud Isn't Always the Answer

We have spent the last fifteen years moving everything to the cloud. Documents, photos, code, notes, spreadsheets, design files, task lists—all of it lives on someone else’s server, accessed through a browser, dependent on a network connection and the continued goodwill of a SaaS provider.

For many applications, this architecture is perfectly sensible. But a growing number of developers, researchers, and companies are asking a heretical question: what if the data lived on the user’s device first, and the cloud was just a convenient sync layer?

This is the local-first movement, and it is driven by real technical advances in conflict resolution algorithms, sync engines, and offline-capable architectures that make it practical to build software where the user’s device is the source of truth.

The Local-First Principles

The term “local-first” was crystallized in a 2019 research paper by Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan at Ink & Switch. They outlined seven principles for local-first software:

No spinners. The app works instantly because operations happen on local data. Network latency is never in the critical path of user interactions.
Your work is not trapped on one device. Data syncs across all your devices seamlessly.
The network is optional. The app is fully functional offline. Network connectivity enhances the experience (syncing, collaboration) but is not required for core functionality.
Seamless collaboration. Multiple users can work on the same data simultaneously, with conflicts resolved automatically.
The Long Now. Your data remains accessible for decades, not dependent on a company maintaining a server.
Security and privacy by default. Data can be encrypted end-to-end since the server does not need to read it—it only needs to relay encrypted sync messages.
User retains ownership and control. Data is not held hostage by a vendor’s terms of service or business viability.

These principles resonate because they describe the experience we want from software but rarely get. Most cloud applications violate several of these principles: they show spinners, they require internet access, and they trap your data in proprietary formats on servers you do not control.

CRDTs: The Core Technology

The fundamental technical challenge of local-first software is conflict resolution. When two users edit the same document on different devices while offline, what happens when they reconnect? Traditional approaches either lock documents (pessimistic concurrency), require manual conflict resolution (git-style merge conflicts), or use last-write-wins semantics (losing data).

Conflict-free Replicated Data Types (CRDTs) solve this problem mathematically. A CRDT is a data structure designed so that any two replicas that have received the same set of updates will converge to the same state, regardless of the order in which updates were received. No coordination, no locking, no manual conflict resolution.

How CRDTs Work (Simply)

The intuition behind CRDTs is that some operations naturally commute. If Alice adds “milk” to a shopping list and Bob adds “eggs,” the result should contain both items regardless of which operation is applied first. A set-based CRDT (called a G-Set, or grow-only set) captures this: additions are always safe to merge.

For more complex data types, CRDTs use clever combinations of unique identifiers, logical timestamps, and causal ordering to ensure convergence. A text CRDT, for example, assigns each character a unique position identifier that determines its order relative to other characters. Two users typing at different positions can both insert characters, and the CRDT guarantees a consistent merged result.

// Conceptual example: a simple counter CRDT
// Each node maintains its own counter
// The global value is the sum of all node counters

class GCounter {
  constructor(nodeId) {
    this.nodeId = nodeId;
    this.counts = {};  // nodeId -> count
  }

  increment() {
    this.counts[this.nodeId] = (this.counts[this.nodeId] || 0) + 1;
  }

  value() {
    return Object.values(this.counts).reduce((sum, n) => sum + n, 0);
  }

  merge(other) {
    // Take the max of each node's counter
    for (const [nodeId, count] of Object.entries(other.counts)) {
      this.counts[nodeId] = Math.max(this.counts[nodeId] || 0, count);
    }
  }
}

The key insight is the merge function: it takes the maximum of each node’s counter. This operation is commutative (order does not matter), associative (grouping does not matter), and idempotent (applying the same merge twice has no additional effect). These three properties guarantee convergence.

The Evolution of Text CRDTs

The hardest and most commercially important CRDT problem is collaborative text editing. Early text CRDTs (like Logoot and LSEQ) worked but had performance issues with large documents. The field has advanced dramatically:

Yjs uses a highly optimized CRDT implementation that achieves performance competitive with centralized approaches like Operational Transformation (OT). Its internal encoding is compact enough for real-time collaborative editing of large documents.
Automerge provides a general-purpose CRDT library that supports text, JSON-like documents, lists, and maps. The latest version (Automerge 2) significantly improved performance through a Rust core with language bindings.
Diamond Types, a research project by Joseph Gentle, has demonstrated that CRDT performance can match or exceed OT implementations, challenging the long-held assumption that CRDTs are inherently slower for text editing.

The Sync Engine Landscape

CRDTs handle conflict resolution, but a complete local-first application also needs sync—the machinery for getting data between devices and users. Several sync engines have emerged, each with different trade-offs:

Automerge

Automerge is both a CRDT library and a sync protocol. It provides document-level CRDTs with a JSON-like data model, making it natural for applications that work with structured data. The sync protocol is efficient, transferring only the operations that the other side has not seen.

Automerge is well-suited for applications where the data model maps naturally to JSON documents: note-taking apps, project management tools, and design tools. Its Rust implementation with JavaScript, Python, and Swift bindings makes it cross-platform.

Yjs

Yjs has become the de facto standard for real-time collaborative editing in web applications. It is used by Notion (partially), AFFiNE, BlockSuite, and numerous other collaborative editors. Yjs provides shared types (text, array, map, XML) and a provider-based architecture where different network transports can be plugged in.

import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'
import { IndexeddbPersistence } from 'y-indexeddb'

const doc = new Y.Doc()

// Local persistence - works offline
const indexeddbProvider = new IndexeddbPersistence('my-document', doc)

// Network sync - when available
const wsProvider = new WebsocketProvider(
  'wss://sync.example.com', 'my-document', doc
)

// Shared text type with CRDT conflict resolution
const ytext = doc.getText('content')
ytext.insert(0, 'Hello, local-first world!')

Yjs’s strength is its ecosystem: integrations exist for ProseMirror, TipTap, CodeMirror, Monaco, Quill, and other popular editors.

ElectricSQL

ElectricSQL takes a different approach by synchronizing PostgreSQL data to local SQLite databases. Instead of requiring developers to adopt new data structures (CRDTs), it lets them work with familiar SQL and handles the sync transparently. This is particularly appealing for applications with existing PostgreSQL backends that want to add offline capabilities.

The trade-off is that ElectricSQL’s conflict resolution is more constrained than general-purpose CRDTs. It works within the semantics of relational data, which is powerful for many applications but less flexible for freeform collaborative editing.

PowerSync

PowerSync is similar in philosophy to ElectricSQL—syncing server-side databases to local SQLite—but supports multiple backend databases (PostgreSQL, MongoDB, MySQL) and provides SDKs for React Native, Flutter, and web applications. It is particularly popular in the mobile development community, where offline capability is often a hard requirement.

Real Applications Using Local-First Architecture

Linear

Linear, the project management tool that has become the default choice for many engineering teams, is built on local-first principles. The application loads instantly because the data is cached locally. Operations like creating issues, updating status, and reorganizing projects happen immediately on the local store and sync in the background. The result is an interface that feels dramatically faster than traditional cloud-based project management tools.

Figma

While Figma is primarily cloud-based, its real-time collaboration engine uses CRDT-inspired techniques for conflict resolution. Multiple designers can edit the same file simultaneously without locks, and changes merge automatically. Figma’s approach demonstrates that CRDT concepts can be applied even in architectures that are not purely local-first.

Obsidian

Obsidian is a knowledge management tool that stores notes as plain Markdown files on the user’s local filesystem. It works entirely offline, and its optional Obsidian Sync service provides end-to-end encrypted sync across devices. This architecture gives users complete ownership of their data—the notes are just files, readable by any text editor, independent of Obsidian’s continued existence.

AFFiNE

AFFiNE is an open-source alternative to Notion built on a local-first architecture using Yjs and BlockSuite. Documents are stored locally and synced via CRDT-based protocols, providing offline functionality and real-time collaboration without sacrificing data ownership.

The Architecture of a Local-First Application

Building a local-first application requires rethinking several architectural assumptions:

Local Storage as the Source of Truth

The application reads from and writes to a local database (IndexedDB in browsers, SQLite on mobile and desktop). All user interactions are local operations that complete immediately. The network is used only for sync.

Sync Layer

A background sync process sends local changes to other devices and applies remote changes to the local store. The sync protocol must handle intermittent connectivity, partial syncs, and out-of-order delivery.

Conflict Resolution

CRDTs or CRDT-inspired algorithms ensure that concurrent edits merge deterministically. The application must be designed so that the CRDT semantics produce sensible results for the user’s use case.

Authentication and Authorization

Since the server is a sync relay rather than the application backend, authentication and authorization models change. The server needs to know which users can sync which documents, but it does not need to validate business logic—that happens locally.

Trade-Offs and Challenges

Local-first is not a free lunch. Honest practitioners acknowledge several challenges:

Storage limits. Local devices have finite storage. Applications that handle large datasets (media libraries, data analytics) may not fit comfortably on every device.
Initial sync. When a user sets up a new device, all their data needs to sync. For large datasets, this can be slow and bandwidth-intensive.
Server-side computation. Some operations (search across all documents, analytics, machine learning) require access to the full dataset, which may only be available server-side.
CRDT complexity. While using CRDTs has become easier with libraries like Yjs and Automerge, understanding their semantics and debugging merge behavior is harder than working with a centralized database.
Access revocation. Once data is on a user’s device, you cannot unilaterally revoke access. This is a feature (user ownership) and a challenge (compliance requirements like GDPR right to deletion).
Schema evolution. Migrating data schemas across distributed local databases is harder than running a migration on a central server.

When to Go Local-First vs. Cloud-First

Local-first makes the most sense when:

User experience depends on low latency (creative tools, note-taking, project management).
Offline access is important (mobile apps, field work, travel).
Data privacy is a primary concern (personal knowledge management, health data, financial records).
Data longevity matters (personal archives, research notes, legal documents).
The dataset per user is manageable (documents, not petabyte data lakes).

Cloud-first remains the better choice when:

The application requires server-side computation (ML inference, heavy analytics).
Data is inherently shared and centralized (social media feeds, marketplace listings).
Regulatory requirements demand centralized audit trails and access control.
The dataset is too large for local storage.
Real-time coordination with strict consistency is required (financial transactions, inventory management).

The Future of Local-First

Several trends are accelerating local-first adoption. Device storage and processing power continue to increase, making local computation more practical. WebAssembly enables running complex data structures (including Rust-based CRDT implementations) efficiently in browsers. Edge computing blurs the line between local and cloud, enabling sync architectures that leverage nearby edge nodes for lower latency.

The tooling is also maturing rapidly. Two years ago, building a local-first application required deep expertise in distributed systems. Today, libraries like Yjs, Automerge, ElectricSQL, and PowerSync have raised the abstraction level enough that application developers can focus on their product rather than the sync machinery.

Conclusion

Local-first software is not about rejecting the cloud. It is about putting the user’s device back at the center of the experience and using the cloud as infrastructure rather than as the mandatory intermediary for every interaction.

The technical foundations—CRDTs, sync engines, offline storage—have matured enough to make this practical for a wide range of applications. The user experience benefits are tangible: instant interactions, offline capability, and data ownership. And the growing ecosystem of tools and libraries means you no longer need a PhD in distributed systems to build local-first software.

Not every application should be local-first. But if your users would benefit from instant responsiveness, offline access, and true data ownership, the technology is ready. The cloud is not always the answer, and we finally have the tools to prove it.

Local-First Software: CRDTs, Sync Engines, and Why the Cloud Isn’t Always the Answer

ByMichael Sun