Platform

Everything you need to build with real data — without the production risk.

SOFI is an on-prem test data platform for virtualization, masking, refresh, and compliance evidence. Provision masked database workspaces without sending production data to an external SaaS.

Pillars

The complete stack for non-production data.

Six layers that work together to deliver realistic, secure, fresh test data to your team.

Virtualization

Thin-clones from snapshots. Each VDB takes < 5% of the source storage.

Masking

50+ native rules. Automatic PII detector for SSN, email, credit cards, and addresses.

CDC

Logical replication via WAL, binlog, and LogMiner. Keeps staging in sync with prod.

38+ connectors

PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, ClickHouse, Cassandra and more.

Multi-tenancy

ORM-level isolation via tenant_id. Mandatory soft-delete. Full audit trail.

API & CLI

FastAPI REST. Hook into your CI/CD and provision VDBs inside pull requests.

< 60s
average VDB provisioning time
95%
storage reduction vs. traditional copy
100M+
rows masked without OOM
38
databases supported natively

Architecture

Built to scale with your team.

SOFI runs close to your databases: API workers, masking jobs, snapshot management, and provisioning workers stay inside your private environment. The dashboard and automation surface sit on top of the same audited control path.

┌─ apps/api ────── FastAPI · async SQLAlchemy
│  ├─ Celery × 6 queues
│  └─ RBAC + tenant-scoped access
│
├─ apps/web ────── Next.js 14 · App Router
│  ├─ React Query · Zustand
│  └─ shadcn/ui · Tailwind
│
└─ engine ──────── private data plane
   ├─ Snapshot Manager · CoW
   ├─ DataMasker (50+ rules)
   ├─ CDC (WAL · binlog · LogMiner)
   └─ Cluster Provisioner

Ready to virtualize test data without shipping it outside?