Pattern: Embedded Firmware¶
Quick facts
- Category: Systems & Infrastructure
- Maturity: Adopt
- Typical team size: 1-4 engineers
- Typical timeline to MVP: 8-20 weeks (hardware dependency)
- Last reviewed: 2026-05-03 by Architecture Team
1. Context¶
Use this pattern when:
- Writing software that runs directly on microcontrollers (MCUs) or microprocessors in IoT devices, embedded controllers, or hardware peripherals
- The execution environment has severe resource constraints: kilobytes of RAM, no MMU, no operating system (bare metal) or a small RTOS
- Deterministic timing and hardware I/O control (GPIO, SPI, I2C, UART, ADC) are first-class requirements
Do NOT use this pattern when:
- The target is a general-purpose Linux SBC (Raspberry Pi, BeagleBone) — use standard Linux application development patterns; this pattern is for MCU-class devices
- The firmware controls a safety-critical function (medical device, automotive ECU) — see the Safety-Critical patterns; those require certification that this pattern does not cover
- A high-level scripting runtime is acceptable — MicroPython or Lua on ESP32 may be sufficient for simple IoT sensors
2. Problem it solves¶
Embedded firmware runs on hardware with kilobytes of RAM, no virtual memory, and often no OS. Code must fit in flash, execute deterministically, respond to hardware events in microseconds, and run reliably for years without a reboot. This pattern captures the toolchain, RTOS choices, and architectural decisions that enable a small team to produce correct, maintainable firmware for a constrained device.
3. Solution overview¶
System context (C4 Level 1)¶
flowchart LR
Hardware((Hardware\nMCU + peripherals)] --> Firmware[Firmware\nC / Rust on RTOS]
Firmware -->|UART / SPI / I2C| Sensors[Sensors & Actuators]
Firmware -->|WiFi / BLE / LoRa| Cloud[Cloud Backend\nor gateway]
DevPC((Developer)) --> Debugger[Debug Probe\nJ-Link / CMSIS-DAP]
Debugger --> Firmware
Container view (C4 Level 2)¶
flowchart TB
subgraph Firmware - runs on MCU
HAL[Hardware Abstraction Layer\nvendor HAL / Embassy / Zephyr drivers]
RTOS[RTOS Kernel\nFreeRTOS / Zephyr / bare-metal]
AppTasks[Application Tasks\none task per concern]
CommStack[Communication Stack\nWiFi / BLE / MQTT]
OTA[OTA Update\nbootloader + firmware slots]
WDT[Watchdog Timer\nfeed from healthy tasks]
end
subgraph Build System
Toolchain[Cross-compiler\narm-none-eabi-gcc / rustup target]
CMake[CMake / Cargo\nbuild + link]
FlashTool[Flash Tool\nOpenOCD / probe-rs]
end
subgraph Testing
UnitTests[Unit Tests\nnative host compilation]
HIL[Hardware-in-the-Loop\ntest jig]
end
HAL --> RTOS
RTOS --> AppTasks
AppTasks --> CommStack
AppTasks --> WDT
OTA --> RTOS
Toolchain --> CMake --> FlashTool
AppTasks --> UnitTests
4. Technology stack¶
| Layer | Primary choice | Alternatives | Notes |
|---|---|---|---|
| Language | C (with MISRA guidelines) | Rust (Embassy / Zephyr), C++ (embedded) | C is the universal embedded language with the most tooling, HAL support, and examples; Rust for new projects where the team can invest in the learning curve — see ADR-0011 |
| RTOS | FreeRTOS | Zephyr RTOS, bare metal, ThreadX (Azure) | FreeRTOS for ARM Cortex-M devices: small footprint, well-documented, vast community; Zephyr for complex multi-radio IoT (BLE + WiFi + LoRa in one OS) |
| HAL | Vendor HAL (STM32 HAL, ESP-IDF) | CMSIS (ARM), Embassy (Rust async HAL) | Use the vendor-provided HAL for peripheral drivers; Embassy for Rust targets provides an async-native HAL |
| Build system | CMake | Zephyr's west / CMake, Cargo (Rust) | CMake is the standard for C embedded projects; west is Zephyr's meta-tool (wraps CMake) |
| Debug probe | CMSIS-DAP (open standard) | J-Link (Segger), ST-Link | CMSIS-DAP with OpenOCD for an open-source toolchain; J-Link for commercial projects needing the best trace/profiling support |
| OTA updates | MCUboot (open-source bootloader) | Vendor OTA (ESP-IDF OTA, STM32 OTA) | MCUboot provides A/B firmware slots with cryptographic signature verification; always support rollback |
| Communication | MQTT over WiFi / LTE | CoAP, custom binary protocol | MQTT for constrained devices publishing to cloud brokers (AWS IoT Core, HiveMQ); CoAP for very constrained devices |
| Testing | Unity test framework (C) + native host build | CTest, Ceedling | Compile and run unit tests on the host (x86) to avoid hardware dependency; supplement with hardware-in-the-loop (HIL) tests |
5. Non-functional characteristics¶
| Concern | Profile |
|---|---|
| Scalability | No scalability dimension — each device runs one firmware instance. Fleet scalability (managing millions of devices) is a backend/OTA concern, not a firmware concern. |
| Availability target | Firmware must not hang or crash; design for 99.999% uptime (< 5 minutes downtime per year). Watchdog timer: any task that stops feeding the watchdog triggers a reset. Brownout detection: handle power loss gracefully. |
| Latency target | Interrupt handlers: < 10 μs for safety-critical or real-time tasks. Application task response: < 1 ms for sensor polling. Hard real-time requirements require careful task priority assignment and interrupt latency analysis. |
| Security posture | Disable JTAG/SWD in production builds (readback protection). Sign firmware images (MCUboot with Ed25519). Encrypt sensitive data in flash (device key in OTP). Never ship debug/test code in production builds. Implement secure boot. |
| Data residency | Data on device (flash, EEPROM) is physically in the device. If the device is deployed in the EU and collects personal data (location, biometrics), GDPR applies — the device must be able to erase local PII on command. |
| Compliance fit | CE / FCC certification required for RF devices before sale; involves hardware testing, not just software. IEC 62304 (medical) and ISO 26262 (automotive) impose process requirements — see Safety-Critical patterns if applicable. PSA Certified (Platform Security Architecture) for IoT security baseline. |
6. Cost ballpark¶
Firmware development cost is primarily engineering time; hardware BOM and test equipment are separate.
| Scale | Devices deployed | Monthly cost (ops) | Cost drivers |
|---|---|---|---|
| Small | < 1,000 | $50 - $500 | OTA server hosting, cloud backend, IoT broker |
| Medium | 1k - 100k | $500 - $5,000 | IoT platform (AWS IoT Core), OTA CDN, device management |
| Large | 100k+ | $5,000 - $30,000 | IoT platform at scale, fleet management, firmware signing infrastructure |
7. LLM-assisted development fit¶
| Aspect | Rating | Notes |
|---|---|---|
| Peripheral driver boilerplate (SPI, I2C, UART init) | ★★★★ | Good — vendor HAL patterns are represented; verify against the specific device datasheet. |
| FreeRTOS task and queue scaffolding | ★★★★ | Good; priority inversion and deadlock scenarios need careful manual review. |
| MQTT client integration (e.g., wolfMQTT, Paho) | ★★★★ | Good for standard connect/subscribe/publish patterns. |
| Interrupt service routine (ISR) design | ★★★ | Knows the constraints (no blocking, minimal work in ISR); correctness of shared state between ISR and tasks requires expert review. |
| Architecture decisions | ★ | Don't outsource. Use ADRs. |
Recommended workflow: Start with a blinking LED to validate the toolchain and debug probe before adding any application logic. Write host-compilable unit tests for all business logic from day one — hardware-dependent tests are expensive to run.
8. Reference implementations¶
- Public reference: zephyrproject-rtos/zephyr — Zephyr RTOS;
samples/covers BLE, WiFi, sensor drivers, shell, and OTA for dozens of hardware targets (200 OK ✓) - Public reference: embassy-rs/embassy — Embassy: async Rust embedded framework;
examples/covers ARM Cortex-M, STM32, nRF, and RP2040 targets with async tasks and HAL drivers (200 OK ✓) - Internal case study: Add your anonymised internal example here
9. Related decisions (ADRs)¶
10. Known risks & gotchas¶
- Watchdog timer not fed by all tasks — one task blocks waiting for a resource; the watchdog is only fed by other tasks; the device resets in production. Mitigation: each task must feed its own watchdog token; use a watchdog aggregator that resets the hardware watchdog only after all tasks have checked in.
- Stack overflow in task silently corrupts memory — an RTOS task overflows its stack; memory corruption follows with unpredictable behaviour. Mitigation: use stack overflow detection (FreeRTOS
configCHECK_FOR_STACK_OVERFLOW); fill stacks with a sentinel pattern; measure actual stack usage during testing with the HWM (high water mark). - Flash wear causes corruption on high-frequency writes — writing to the same flash page (log, counter, config) thousands of times per day exceeds the flash endurance (10k–100k erase cycles). Mitigation: use a wear-levelling flash file system (LittleFS); never write to raw flash directly for frequently updated data.
- OTA update interrupted by power loss bricks the device — partial firmware write renders the device unbootable. Mitigation: always use A/B firmware slots (MCUboot); never overwrite the running slot; only switch to the new slot after the full image is written and verified by CRC/signature.
- JTAG left enabled in production firmware — an attacker with physical access reads the flash and extracts firmware or secrets. Mitigation: write a production build script that enables readback protection (RDP Level 2 on STM32); verify RDP status in the CI release pipeline before signing.