Skip to content

Pattern: Scientific / Engineering Simulation

Quick facts

  • Category: Games & Graphics
  • Maturity: Adopt
  • Typical team size: 2-6 engineers (often with domain scientists)
  • Typical timeline to MVP: 8-20 weeks
  • Last reviewed: 2026-05-03 by Architecture Team

1. Context

Use this pattern when:

  • Simulating physical, chemical, biological, or engineering systems where numerical accuracy is a first-class requirement alongside (or above) real-time performance
  • Building training simulators, digital twins, computational fluid dynamics (CFD) tools, finite element analysis (FEA), or structural engineering software
  • Results must be reproducible, versioned, and validated against physical measurements or analytical solutions

Do NOT use this pattern when:

  • The simulation is purely visual and approximate accuracy is acceptable — use a game engine's physics engine instead
  • The simulation is a simple analytical model that runs in milliseconds — a Python script or spreadsheet is sufficient
  • Real-time interactive performance is more important than numerical precision — game engine physics is appropriate for interactive training applications where "good enough" accuracy suffices

2. Problem it solves

Engineering and scientific decisions depend on simulation results. A structural engineer needs to know whether a bridge design fails under load before building it. A pharma company needs to simulate molecular interactions before running wet lab experiments. These simulations involve large numerical systems (millions of degrees of freedom), stiff differential equations, or complex geometry that requires specialised numerical methods — not a game engine physics approximation.

3. Solution overview

System context (C4 Level 1)

flowchart LR
    Engineer((Engineer / Scientist)) --> PreProc[Pre-processor\ngeometry + mesh + BCs]
    PreProc --> Solver[Solver\nPDE / ODE / Monte Carlo]
    Solver --> PostProc[Post-processor\nresults + visualisation]
    Solver --> HPC[HPC Cluster\nor Cloud GPU]
    PostProc --> Report[Report / Export\nVTK, HDF5, CSV]
    ExpData[(Experimental Data)] -->|validation| Solver

Container view (C4 Level 2)

flowchart TB
    subgraph Pre-processing
        CADImport[CAD / Geometry Import\n.STEP, .STL, .IGES]
        Mesher[Mesher\nGmsh / snappyHexMesh]
        BCSetup[Boundary Condition Setup\nproblem parameters]
    end
    subgraph Solver
        TimeIntegrator[Time Integrator\nRunge-Kutta, Newmark-β]
        LinearSolver[Linear Solver\nPETSc / SciPy sparse]
        ParallelMPI[MPI Parallelism\nOpenMPI / mpi4py]
        GPUKernels[GPU Kernels\nCUDA / CuPy — optional]
    end
    subgraph Post-processing
        VTKWriter[VTK Writer\nPyVista]
        Plotter[Plotter\nMatplotlib / Paraview]
        ResultDB[(Result Store\nHDF5 / NetCDF on S3)]
    end
    subgraph Experiment Management
        MLflow[MLflow / DVC\nversioned runs + params]
    end

    CADImport --> Mesher --> BCSetup --> TimeIntegrator
    TimeIntegrator --> LinearSolver
    LinearSolver --> ParallelMPI
    LinearSolver --> GPUKernels
    TimeIntegrator --> VTKWriter --> ResultDB
    VTKWriter --> Plotter
    MLflow -.-> TimeIntegrator

4. Technology stack

Layer Primary choice Alternatives Notes
Compute language Python (NumPy + SciPy) C++ (performance-critical kernels), Fortran (legacy HPC), Julia Python for orchestration and post-processing; C/C++ extension modules for inner loops; Julia for teams wanting MATLAB-like syntax with C-level performance
Numerical library NumPy + SciPy JAX (auto-diff + GPU), PyTorch (ML-adjacent) SciPy provides sparse solvers, ODE integrators, and signal processing; JAX for simulations requiring automatic differentiation
Parallel computing MPI via mpi4py Dask (task graph), Ray MPI for tightly coupled parallel solvers (CFD, FEA); Dask for embarrassingly parallel parameter sweeps
GPU acceleration CuPy (drop-in NumPy for CUDA) JAX (XLA), Numba (CUDA kernels) CuPy for porting NumPy code to GPU with minimal changes; Numba for writing custom CUDA kernels in Python
Meshing Gmsh OpenFOAM's snappyHexMesh, CGAL Gmsh provides a Python API for programmatic mesh generation; well-suited for complex 3D geometries
Visualisation PyVista + ParaView Matplotlib (2D), VTK (low-level) PyVista wraps VTK with a simpler Python API; ParaView for interactive exploration of large results
Results storage HDF5 (via h5py) on AWS S3 NetCDF4, Zarr HDF5 for structured multidimensional result arrays; Zarr for cloud-native chunked access without full-file download
Experiment tracking DVC (data + model versioning) MLflow, Sacred DVC versions both code and large data files (meshes, results); essential for reproducible simulation runs

5. Non-functional characteristics

Concern Profile
Scalability Tightly coupled solvers scale via MPI across nodes; diminishing returns above ~1,000 cores for most problems (Amdahl's Law). Embarrassingly parallel parameter sweeps scale linearly. Cloud HPC (AWS HPC instances, Google HPC) provides on-demand burst capacity without owning hardware.
Availability target Simulation jobs run to completion; they are not long-running services. Availability = "job completes and results are retrievable." Use checkpointing to allow job restart from an intermediate state after a node failure.
Latency target Wall-clock time to solution is the metric. Define acceptable solve time per problem size in the requirements; profile solver performance against this target.
Security posture Simulation inputs often represent proprietary designs (CAD, IP). Encrypt at rest (S3 SSE-KMS). Restrict cluster access to authenticated researchers. Validate all mesh inputs before they enter the solver — malformed meshes can cause unbounded memory consumption.
Data residency Large result files (TB-scale HPC output) must reside in a defined region for export control (ITAR, EAR) compliance if the simulation relates to defence or dual-use technology.
Compliance fit Export control (ITAR/EAR) may restrict cloud provider choice and data sharing for defence-related simulations. FDA 21 CFR Part 11 applies to simulation software used in medical device submission. Academic and funded research may require open data archiving (Zenodo, institutional repository).

6. Cost ballpark

Indicative monthly USD cost. HPC compute time is the dominant cost.

Scale Simulation size Monthly cost Cost drivers
Small Single-node, < 1M DOF $100 - $500 EC2 c5.4xlarge or m5.8xlarge on-demand
Medium Multi-node MPI, 1M-100M DOF $1,000 - $10,000 HPC instances (hpc6a), S3 storage for results, EFA networking
Large GPU cluster, >100M DOF $10,000 - $100,000 p4d/p5 GPU instances, Lustre scratch filesystem (FSx), result archive storage

7. LLM-assisted development fit

Aspect Rating Notes
NumPy / SciPy numerical boilerplate (ODE setup, sparse matrix assembly) ★★★★ Good; verify numerical method choice and stability conditions with a domain expert.
MPI parallelism scaffolding (mpi4py scatter/gather) ★★★ Generates structurally correct patterns; load balancing and communication overlap require expert tuning.
HDF5 / VTK file I/O ★★★★★ Excellent — file format APIs are well-represented.
Numerical algorithm selection (solver, preconditioner, time integrator) ★★ Knows the names; selecting the right algorithm for a specific PDE and mesh requires numerical analysis expertise.
Architecture decisions Don't outsource. Use ADRs.

Recommended workflow: Validate the solver against an analytical solution or published benchmark before adding parallelism or GPU acceleration. Reproduce a known result first; optimise second.

8. Reference implementations

  • Public reference: numpy/numpy — NumPy; numpy/core/ and the documentation tutorials show the array computing foundation underpinning all Python scientific simulation (200 OK ✓)
  • Public reference: visgl/deck.gl — deck.gl; large-scale geospatial and scientific data visualisation on the GPU using WebGL (200 OK ✓)
  • Internal case study: Add your anonymised internal example here
  • No ADRs recorded yet. Candidate: Python vs Julia vs C++ for performance-critical simulation kernels.

10. Known risks & gotchas

  • Solver divergence produces plausible-looking wrong answers — a stiff ODE with too large a time step produces results that look physically reasonable but are numerically wrong; the simulation has diverged silently. Mitigation: implement a validation test suite with analytical solutions for simple cases before running on real problems; monitor residual norms per timestep.
  • Memory exhaustion from mesh refinement — doubling mesh resolution in 3D increases element count eightfold; the solver runs out of RAM partway through. Mitigation: estimate memory requirements before running (DOF count × sparse matrix density × data type size); run a quick coarse-mesh test to verify the setup before the full fine-mesh solve.
  • Reproducibility lost without versioning inputs — a result cannot be reproduced six months later because the mesh, input parameters, or code version are not tracked. Mitigation: use DVC to version both code and input data; record the full solver configuration (seed, tolerances, mesh hash) in the experiment tracking system on every run.
  • Export control violation for cloud HPC — a defence-related simulation workload runs on a cloud provider whose data centre is in an embargoed country. Mitigation: verify cloud region data residency before submitting; consult legal counsel for any simulation touching ITAR or EAR-controlled technology.
  • Parallelism scaling cliff — MPI job scales well from 1 to 32 cores then levels off; adding 128 cores makes it slower due to communication overhead. Mitigation: profile communication vs compute ratio; perform a strong-scaling study before purchasing large reserved compute capacity.