CASE STUDY

Safaricom migration — 700+ servers, 6,000 TPS to 12,000 TPS

We migrated an enterprise-grade stack across 700+ servers, redesigned pipelines for resilience, and tuned systems to sustain 6,000 transactions per second — then enabled predictable scaling to 12,000 TPS during peak events.

Outcome 6,000 → 12,000 TPS 700+ servers

Book a consultation Start live chat / call

Need urgent help? Call 0734125936 — we respond 24/7.

We’ll only use your contact details to schedule this conversation.

Operational view — multi-region cluster

Photograph: server rack close-up

S

Project scope

700+ servers • multi-region migration

Delivered

Inventory, automation, staged failover, and capacity validation.

The challenge

High-volume digital commerce, unpredictable peak events, and a fragmented estate of 700+ servers left Safaricom with brittle deployments, unclear ownership of failure modes, and costs that spiked under load. The business required a migration that preserved availability while enabling capacity to double, with zero customer-impact during cutover windows.

Constraints

700+ heterogeneous servers (Linux & Windows)
Service-levels requiring ≥99.99% uptime during migration
Throughput target: sustain 6,000 TPS and scale to 12,000 TPS
Minimal maintenance windows

Our approach — design for observability and controlled velocity

We treated the migration as product work: small, testable increments; automated rollback; instrumentation-first; and governance gates to move at velocity without losing control.

Discover

Inventory, SLA review, risk mapping

Design

Architecture, automation, cutover plans

Deploy

Staged rollouts, canaries, automated rollback

Operate

Observability, runbooks, and SRE handoffs

Technical highlights

IaC

Terraform modules standardized for reproducible builds and drift detection

Pipelines

Blue/green deployments, Canary routing, and automated rollback on latency/err thresholds

Observability

High-cardinality tracing, custom dashboards, and automated incident playbooks

Capacity

Autoscaling policies tuned with real traffic, staged capacity tests to validate 12K TPS

flow diagram simplified

Results that mattered

Peak throughput

12,000 TPS

Scalable capacity validated in staged load tests

Servers migrated

700+

Heterogeneous Linux & Windows estate consolidated

Availability impact

≥99.99%

Zero customer-impact during cutovers

"Pro Servers Ltd migrated our estate without customer-impact and gave us confidence to scale. Their engineering discipline helped us meet peak demand reliably." — Safaricom engineering lead

Project timeline: 6 months • Team: cross-functional on-site + remote

Engineering notes (click to expand)

# Terraform modules standardized for cluster provisioning
module "cluster" {
  source = "git::https://example.com/terraform-modules//cluster"
  region = var.region
  count  = var.cluster_count
}

# GitLab CI pipeline: blue/green and canary stages
stages:
  - build
  - canary
  - promote
  - teardown

# HPA and custom metrics for TPS-based scaling
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metrics:
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "6000"

This migration was a systems engineering exercise with business stakes.

We align infrastructure to measurable KPIs — uptime, throughput, developer velocity, and cost — and we operate as partners in delivery. To discuss how we can make your infrastructure a predictable growth engine, book a consultation or call 0734125936.

Book a consultation Call 0734125936