Pattern: Scientific / Engineering Simulation¶

Quick facts

Category: Games & Graphics
Maturity: Adopt
Typical team size: 2-6 engineers (often with domain scientists)
Typical timeline to MVP: 8-20 weeks
Last reviewed: 2026-05-03 by Architecture Team

1. Context¶

Use this pattern when:

Simulating physical, chemical, biological, or engineering systems where numerical accuracy is a first-class requirement alongside (or above) real-time performance
Building training simulators, digital twins, computational fluid dynamics (CFD) tools, finite element analysis (FEA), or structural engineering software
Results must be reproducible, versioned, and validated against physical measurements or analytical solutions

Do NOT use this pattern when:

The simulation is purely visual and approximate accuracy is acceptable — use a game engine's physics engine instead
The simulation is a simple analytical model that runs in milliseconds — a Python script or spreadsheet is sufficient
Real-time interactive performance is more important than numerical precision — game engine physics is appropriate for interactive training applications where "good enough" accuracy suffices

2. Problem it solves¶

Engineering and scientific decisions depend on simulation results. A structural engineer needs to know whether a bridge design fails under load before building it. A pharma company needs to simulate molecular interactions before running wet lab experiments. These simulations involve large numerical systems (millions of degrees of freedom), stiff differential equations, or complex geometry that requires specialised numerical methods — not a game engine physics approximation.

3. Solution overview¶

System context (C4 Level 1)¶

flowchart LR
    Engineer((Engineer / Scientist)) --> PreProc[Pre-processor\ngeometry + mesh + BCs]
    PreProc --> Solver[Solver\nPDE / ODE / Monte Carlo]
    Solver --> PostProc[Post-processor\nresults + visualisation]
    Solver --> HPC[HPC Cluster\nor Cloud GPU]
    PostProc --> Report[Report / Export\nVTK, HDF5, CSV]
    ExpData[(Experimental Data)] -->|validation| Solver

Container view (C4 Level 2)¶

flowchart TB
    subgraph Pre-processing
        CADImport[CAD / Geometry Import\n.STEP, .STL, .IGES]
        Mesher[Mesher\nGmsh / snappyHexMesh]
        BCSetup[Boundary Condition Setup\nproblem parameters]
    end
    subgraph Solver
        TimeIntegrator[Time Integrator\nRunge-Kutta, Newmark-β]
        LinearSolver[Linear Solver\nPETSc / SciPy sparse]
        ParallelMPI[MPI Parallelism\nOpenMPI / mpi4py]
        GPUKernels[GPU Kernels\nCUDA / CuPy — optional]
    end
    subgraph Post-processing
        VTKWriter[VTK Writer\nPyVista]
        Plotter[Plotter\nMatplotlib / Paraview]
        ResultDB[(Result Store\nHDF5 / NetCDF on S3)]
    end
    subgraph Experiment Management
        MLflow[MLflow / DVC\nversioned runs + params]
    end

    CADImport --> Mesher --> BCSetup --> TimeIntegrator
    TimeIntegrator --> LinearSolver
    LinearSolver --> ParallelMPI
    LinearSolver --> GPUKernels
    TimeIntegrator --> VTKWriter --> ResultDB
    VTKWriter --> Plotter
    MLflow -.-> TimeIntegrator

4. Technology stack¶

Layer	Primary choice	Alternatives	Notes
Compute language	Python (NumPy + SciPy)	C++ (performance-critical kernels), Fortran (legacy HPC), Julia	Python for orchestration and post-processing; C/C++ extension modules for inner loops; Julia for teams wanting MATLAB-like syntax with C-level performance
Numerical library	NumPy + SciPy	JAX (auto-diff + GPU), PyTorch (ML-adjacent)	SciPy provides sparse solvers, ODE integrators, and signal processing; JAX for simulations requiring automatic differentiation
Parallel computing	MPI via `mpi4py`	Dask (task graph), Ray	MPI for tightly coupled parallel solvers (CFD, FEA); Dask for embarrassingly parallel parameter sweeps
GPU acceleration	CuPy (drop-in NumPy for CUDA)	JAX (XLA), Numba (CUDA kernels)	CuPy for porting NumPy code to GPU with minimal changes; Numba for writing custom CUDA kernels in Python
Meshing	Gmsh	OpenFOAM's snappyHexMesh, CGAL	Gmsh provides a Python API for programmatic mesh generation; well-suited for complex 3D geometries
Visualisation	PyVista + ParaView	Matplotlib (2D), VTK (low-level)	PyVista wraps VTK with a simpler Python API; ParaView for interactive exploration of large results
Results storage	HDF5 (via h5py) on AWS S3	NetCDF4, Zarr	HDF5 for structured multidimensional result arrays; Zarr for cloud-native chunked access without full-file download
Experiment tracking	DVC (data + model versioning)	MLflow, Sacred	DVC versions both code and large data files (meshes, results); essential for reproducible simulation runs

5. Non-functional characteristics¶

Concern	Profile
Scalability	Tightly coupled solvers scale via MPI across nodes; diminishing returns above ~1,000 cores for most problems (Amdahl's Law). Embarrassingly parallel parameter sweeps scale linearly. Cloud HPC (AWS HPC instances, Google HPC) provides on-demand burst capacity without owning hardware.
Availability target	Simulation jobs run to completion; they are not long-running services. Availability = "job completes and results are retrievable." Use checkpointing to allow job restart from an intermediate state after a node failure.
Latency target	Wall-clock time to solution is the metric. Define acceptable solve time per problem size in the requirements; profile solver performance against this target.
Security posture	Simulation inputs often represent proprietary designs (CAD, IP). Encrypt at rest (S3 SSE-KMS). Restrict cluster access to authenticated researchers. Validate all mesh inputs before they enter the solver — malformed meshes can cause unbounded memory consumption.
Data residency	Large result files (TB-scale HPC output) must reside in a defined region for export control (ITAR, EAR) compliance if the simulation relates to defence or dual-use technology.
Compliance fit	Export control (ITAR/EAR) may restrict cloud provider choice and data sharing for defence-related simulations. FDA 21 CFR Part 11 applies to simulation software used in medical device submission. Academic and funded research may require open data archiving (Zenodo, institutional repository).

6. Cost ballpark¶

Indicative monthly USD cost. HPC compute time is the dominant cost.

Scale	Simulation size	Monthly cost	Cost drivers
Small	Single-node, < 1M DOF	$100 - $500	EC2 c5.4xlarge or m5.8xlarge on-demand
Medium	Multi-node MPI, 1M-100M DOF	$1,000 - $10,000	HPC instances (hpc6a), S3 storage for results, EFA networking
Large	GPU cluster, >100M DOF	$10,000 - $100,000	p4d/p5 GPU instances, Lustre scratch filesystem (FSx), result archive storage

7. LLM-assisted development fit¶

Aspect	Rating	Notes
NumPy / SciPy numerical boilerplate (ODE setup, sparse matrix assembly)	★★★★	Good; verify numerical method choice and stability conditions with a domain expert.
MPI parallelism scaffolding (`mpi4py` scatter/gather)	★★★	Generates structurally correct patterns; load balancing and communication overlap require expert tuning.
HDF5 / VTK file I/O	★★★★★	Excellent — file format APIs are well-represented.
Numerical algorithm selection (solver, preconditioner, time integrator)	★★	Knows the names; selecting the right algorithm for a specific PDE and mesh requires numerical analysis expertise.
Architecture decisions	★	Don't outsource. Use ADRs.

Recommended workflow: Validate the solver against an analytical solution or published benchmark before adding parallelism or GPU acceleration. Reproduce a known result first; optimise second.

8. Reference implementations¶

Public reference: numpy/numpy — NumPy; numpy/core/ and the documentation tutorials show the array computing foundation underpinning all Python scientific simulation (200 OK ✓)
Public reference: visgl/deck.gl — deck.gl; large-scale geospatial and scientific data visualisation on the GPU using WebGL (200 OK ✓)
Internal case study: Add your anonymised internal example here

No ADRs recorded yet. Candidate: Python vs Julia vs C++ for performance-critical simulation kernels.

10. Known risks & gotchas¶

Solver divergence produces plausible-looking wrong answers — a stiff ODE with too large a time step produces results that look physically reasonable but are numerically wrong; the simulation has diverged silently. Mitigation: implement a validation test suite with analytical solutions for simple cases before running on real problems; monitor residual norms per timestep.
Memory exhaustion from mesh refinement — doubling mesh resolution in 3D increases element count eightfold; the solver runs out of RAM partway through. Mitigation: estimate memory requirements before running (DOF count × sparse matrix density × data type size); run a quick coarse-mesh test to verify the setup before the full fine-mesh solve.
Reproducibility lost without versioning inputs — a result cannot be reproduced six months later because the mesh, input parameters, or code version are not tracked. Mitigation: use DVC to version both code and input data; record the full solver configuration (seed, tolerances, mesh hash) in the experiment tracking system on every run.
Export control violation for cloud HPC — a defence-related simulation workload runs on a cloud provider whose data centre is in an embargoed country. Mitigation: verify cloud region data residency before submitting; consult legal counsel for any simulation touching ITAR or EAR-controlled technology.
Parallelism scaling cliff — MPI job scales well from 1 to 32 cores then levels off; adding 128 cores makes it slower due to communication overhead. Mitigation: profile communication vs compute ratio; perform a strong-scaling study before purchasing large reserved compute capacity.