Language:English VersionChinese Version

Platform engineering has become one of the most discussed disciplines in software organizations, and for good reason. After a decade of “you build it, you run it” DevOps culture, many companies discovered an uncomfortable truth: giving every team full operational responsibility did not make them faster. It made them spend half their time wrestling with infrastructure instead of building products.

The promise of platform engineering is to fix this by building an Internal Developer Platform (IDP)—a curated, self-service layer that abstracts away infrastructure complexity while preserving the autonomy that DevOps promised. But the gap between building a platform and building one that teams actually use is vast. This article is a practical playbook for navigating that gap.

Platform Engineering vs. DevOps: Clearing Up the Confusion

Platform engineering is not a replacement for DevOps. It is an evolution that addresses DevOps’s scaling problem.

DevOps brought down the wall between development and operations, but it did so by distributing operational knowledge across every team. At a 50-person startup, this works beautifully. At a 500-person organization with 40 product teams, it creates a different problem: every team is independently solving the same infrastructure challenges, making different choices, and accumulating different operational debt.

Platform engineering centralizes the common infrastructure concerns into a dedicated team that builds and maintains a shared platform. Product teams consume the platform through self-service interfaces instead of filing tickets or reading runbooks. The key distinction:

  • DevOps is a culture and set of practices that emphasizes collaboration between dev and ops.
  • Platform engineering is a discipline that builds products (internal platforms) to make DevOps practices scalable.

The platform team does not replace ops. It replaces the undifferentiated operational heavy lifting that every team was doing independently.

The Components of an Internal Developer Platform

An effective IDP is not a single tool. It is a composition of capabilities that together provide a coherent developer experience. The core components include:

Service Catalog

A service catalog is the front door to your platform. It provides a unified view of every service, its ownership, its dependencies, its deployment status, its documentation, and its operational health. Without a service catalog, developers cannot find what already exists, leading to duplication and inconsistency.

A good service catalog answers questions like: Who owns this service? What does it depend on? When was it last deployed? Is it healthy right now? Where is the documentation? What APIs does it expose?

Golden Paths

Golden paths are opinionated, pre-configured templates for common tasks. Need to create a new microservice? The golden path provides a template with CI/CD configuration, observability instrumentation, security scanning, and deployment manifests already set up. Need to provision a database? The golden path provides a self-service workflow that handles provisioning, backup configuration, and access control.

The critical word is “opinionated.” Golden paths encode organizational best practices. They are not enforced—teams can deviate when they have good reasons—but they make the right thing the easy thing.

# Example: A golden path template manifest (simplified)
apiVersion: platform.company.com/v1
kind: ServiceTemplate
metadata:
  name: rest-api-service
spec:
  language: go
  framework: chi
  infrastructure:
    database: postgresql
    cache: redis
    queue: rabbitmq
  observability:
    tracing: opentelemetry
    metrics: prometheus
    logging: structured-json
  ci:
    pipeline: github-actions
    stages: [lint, test, security-scan, build, deploy]
  deployment:
    strategy: rolling
    environments: [dev, staging, production]

Self-Service Infrastructure

Self-service infrastructure lets developers provision and manage resources without filing tickets or waiting for a platform team member. This typically means Terraform or Pulumi modules exposed through a UI or CLI, with guardrails that enforce security policies, cost limits, and naming conventions.

The goal is not to give developers raw access to cloud consoles. It is to provide curated, policy-compliant resource provisioning that is faster than filing a ticket but safer than unrestricted access.

Developer Portal

The developer portal is the user interface that ties everything together. It provides a single pane of glass for service management, documentation, API exploration, environment management, and platform self-service workflows.

The Tool Landscape: Backstage, Port, and Cortex

Several platforms have emerged to accelerate IDP construction. Each makes different trade-offs:

Backstage (Spotify)

Backstage is the open-source juggernaut in this space. Originally built by Spotify to manage their sprawling microservice architecture, it provides a plugin-based developer portal framework with a service catalog, documentation system (TechDocs), and software templates.

Strengths: Massive plugin ecosystem, strong community, fully customizable, no vendor lock-in. Weaknesses: Significant setup and maintenance burden, requires dedicated engineering effort, the React-based UI can feel complex to extend. Backstage is best suited for organizations with the engineering capacity to invest in customization and ongoing maintenance.

Port

Port takes a no-code/low-code approach to building developer portals. It provides a flexible data model where you define “blueprints” (entities like services, environments, deployments) and “actions” (self-service workflows) through a visual interface.

Strengths: Fast time-to-value, no frontend development required, flexible data modeling, built-in RBAC. Weaknesses: SaaS-only (data residency concerns for some organizations), less customizable than Backstage for highly specific UX needs. Port is well-suited for organizations that want a functional IDP quickly without dedicating a team to portal development.

Cortex

Cortex focuses heavily on engineering maturity and operational excellence. Beyond the standard service catalog and scorecards, it provides “initiatives”—organization-wide campaigns to drive adoption of standards like migration to a new Kubernetes version or adoption of a security scanning tool.

Strengths: Strong scorecarding and maturity tracking, initiative management, good Kubernetes integration. Weaknesses: Less flexible than Port or Backstage for custom workflows, primarily focused on operational health rather than full IDP functionality. Cortex is ideal for organizations whose primary pain point is inconsistent operational maturity across teams.

Measuring Developer Experience

You cannot improve what you do not measure, and developer experience metrics are notoriously difficult to get right. The platform engineering community has converged on several approaches:

DORA Metrics

The four DORA metrics—deployment frequency, lead time for changes, change failure rate, and mean time to recovery—remain the gold standard for measuring delivery performance. A well-functioning IDP should demonstrably improve these metrics over time.

SPACE Framework

The SPACE framework, developed by researchers at Microsoft, GitHub, and the University of Victoria, provides a more nuanced view by measuring five dimensions: Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow. Platform teams should track at least one metric from each dimension.

Developer Surveys

Quantitative metrics tell you what is happening. Surveys tell you why. Regular developer experience surveys (quarterly at minimum) help platform teams identify friction points that metrics alone miss. Key questions include: How easy is it to deploy a new service? How long does it take to get a development environment? What is the most frustrating part of your daily workflow?

Time to First Deployment

One of the most revealing metrics is how long it takes a new hire to deploy their first change to production. This single number captures the entire developer onboarding experience, from documentation quality to environment setup to CI/CD pipeline usability. Leading platform organizations aim for hours, not days or weeks.

Avoiding “Build It and They Won’t Come”

The graveyard of internal platforms is full of technically excellent systems that nobody uses. Adoption is the hardest problem in platform engineering, and it requires treating your platform as a product with internal customers.

Start with User Research

Before building anything, interview your developers. Shadow them for a day. Identify the actual pain points, not the ones you assume they have. The most common mistake is building a platform that solves the problems the platform team finds interesting rather than the problems developers actually face.

Ship an MVP, Not a Vision

Your first release should solve exactly one painful problem exceptionally well. If developers spend four hours setting up CI/CD for new services, build a golden path that reduces that to 15 minutes. Demonstrate value before expanding scope.

Make Adoption Effortless

If using the platform requires reading a 50-page guide, you have already lost. The best platforms are adopted because they are obviously easier than the alternative. Self-service workflows should have sensible defaults. Templates should work out of the box. Documentation should be discoverable and concise.

Build Champions, Not Mandates

Forcing teams to adopt the platform breeds resentment. Instead, identify early adopters, give them white-glove support, and amplify their success stories. When other teams see their peers shipping faster with the platform, adoption follows naturally.

Maintain a Feedback Loop

Treat bug reports and feature requests from platform users with the same urgency you would treat customer-facing issues. Slow response to internal feedback is the fastest way to kill platform adoption.

Team Topology for Platform Engineering

The organizational structure of the platform team matters as much as the technology. The Team Topologies framework by Matthew Skelton and Manuel Pais provides useful language for this:

Platform Team as an Enabling Team

In the early stages, the platform team should operate as an enabling team—embedding with product teams, understanding their workflows, and building capabilities that remove friction. This phase is about learning and relationship building.

Platform Team as a Platform Team

As the platform matures, the team transitions to a true platform team that provides self-service capabilities with minimal direct interaction. The interface between the platform team and product teams becomes the platform itself, not meetings and Slack messages.

Staffing the Team

Effective platform teams are cross-functional. You need infrastructure engineers who understand Kubernetes and cloud platforms, but you also need software engineers who can build good user interfaces, technical writers who can create clear documentation, and ideally a product manager who treats the platform as a product.

A common anti-pattern is staffing the platform team exclusively with infrastructure specialists. This produces a technically sophisticated platform with a terrible user experience—which is the same as no platform at all.

Common Pitfalls

  • Building too much abstraction. Abstractions that hide too much complexity become black boxes that developers do not trust. When something goes wrong, they need to understand what is happening underneath.
  • Ignoring escape hatches. Every platform needs a way for teams to step outside the golden path when their use case genuinely requires it. Platforms without escape hatches breed shadow IT.
  • Treating the platform as infrastructure. The platform is a product. It needs a roadmap, user research, UX design, and regular releases. Treating it as “just infra” guarantees low adoption.
  • Underestimating maintenance. Building version one is exciting. Maintaining it, keeping it compatible with upstream changes, and supporting users is where the real work begins.

Conclusion

Platform engineering is not a silver bullet, and an internal developer platform is not something you install. It is something you build, iterate on, and continuously improve based on the needs of your engineering organization.

The organizations succeeding at this discipline share common traits: they treat their platform as a product, they start small and expand based on demonstrated value, they invest as heavily in user experience as in technical capability, and they measure their success not by the sophistication of their infrastructure but by the productivity and satisfaction of their developers.

If your engineering organization has outgrown the “every team manages everything” model and you are ready to build a platform, start with one painful problem, solve it well, and grow from there. The playbook is straightforward. The execution is where the craft lies.

By Michael Sun

Founder and Editor-in-Chief of NovVista. Software engineer with hands-on experience in cloud infrastructure, full-stack development, and DevOps. Writes about AI tools, developer workflows, server architecture, and the practical side of technology. Based in China.

Leave a Reply

Your email address will not be published. Required fields are marked *