IDB-FWA-020
Firmware · OTA · MTM · security
Firmware architecture and manufacturing test
Reference for embedded firmware architecture, OTA update strategy, security baseline, and the manufacturing test mode (MTM) that production lines use to verify every unit.
Abstract
Firmware is the part of the product that updates after launch. Architectural decisions made early determine whether the team can patch a critical bug in week 1 of production, push an OTA security fix in year 3, or ship a manufacturing variant for a new market without re-spinning hardware.
Section 1 covers architecture patterns (HAL, RTOS, bare-metal). Section 2 covers Manufacturing Test Mode (MTM) — the firmware mode that the factory uses to verify every unit. Section 3 covers OTA update strategy. Section 4 covers security baseline. Section 5 covers version control and release process.
1.Firmware architecture
Pick architecture that matches product complexity. Over-engineering wastes ROM; under-engineering creates maintenance debt.
1.1Architecture patterns
| Pattern | Use | Complexity |
|---|---|---|
| Bare-metal (super-loop) | Simple sensors, no real-time constraints | Lowest |
| State machine | UI-driven products, defined modes | Low |
| Cooperative scheduler (event loop) | Connected products with periodic tasks | Medium |
| Preemptive RTOS (FreeRTOS, Zephyr, ThreadX) | Multi-tasking, real-time deadlines | Medium-high |
| Embedded Linux | High-resource SoCs (Raspberry Pi, Yocto) | High |
| Hypervisor + dual-OS | Safety-critical or multi-domain (auto, medical) | Highest |
1.2Hardware Abstraction Layer (HAL)
A HAL separates application code from hardware specifics. Critical for:
- Supporting multiple MCU variantsSame code, different chip.
- Porting to new platformsSwap drivers, keep application.
- Mock hardware for unit testingRun application code on PC.
- Survivability across silicon obsolescenceHardware changes don't ripple through application.
1.3Memory budget
| MCU class | Flash | RAM | Use |
|---|---|---|---|
| Low-end (Cortex-M0, M0+) | 32–128 KB | 4–16 KB | Simple sensors, BLE peripherals |
| Mid-range (Cortex-M3, M4) | 128–512 KB | 32–128 KB | Most consumer IoT, BLE central |
| High-end (Cortex-M4F, M7, M33) | 512 KB – 2 MB | 128–512 KB | Wi-Fi devices, displays, audio |
| Application processor (Cortex-A) | External | External | Linux-class products, smartphones |
Reserve ~25–35 % of flash for OTA updates (need to store new image while running old).
1.4Boot architecture
- Single-image bootOne application image; OTA replaces in-place. Risk: power-loss during update bricks device.
- Dual-image boot (A/B)Two slots, switch on successful update. Recommended — survives power loss.
- Bootloader-basedSmall bootloader manages app load. Standard for production.
1.5Recommended bootloader features
- Application image validationCRC + signature verification before boot.
- FailoverIf new image fails, boot from previous slot.
- Recovery modePin-toggle or magic sequence enters bootloader for emergency reflash.
- Verbose error reportingBoot reason, last-flash status, watchdog counter accessible via UART.
2.Manufacturing Test Mode (MTM)
The firmware mode that the production line uses to verify every unit before it ships. The single most-important production-readiness firmware feature.
2.1What MTM does
- Hardware self-testExercises every accessible peripheral.
- CalibrationStores per-unit cal values (sensor offsets, RF tuning) in non-volatile memory.
- Identity programmingWrites unique serial number, MAC address, regional config.
- Quality measurementCaptures and logs pass/fail data per station.
- Lockout after productionMTM disabled in shipping firmware; access requires JTAG or special command.
2.2MTM test sequence (typical)
| Stage | Test | Method | Pass criteria |
|---|---|---|---|
| 1 | Boot + ID | Read chip ID, verify silicon | Match expected |
| 2 | Power rails | ADC measurements | 3.3 V ±5 %, 1.8 V ±5 % |
| 3 | LED test | Sequentially light each LED | Operator visual |
| 4 | Button test | User presses each button | All registered within 30 s |
| 5 | I/O continuity | Output high/low, read input | Match expected |
| 6 | Comms peripherals | Loopback or external test | UART, SPI, I2C connect |
| 7 | Wireless RF | Transmit beacon, RSSI measure | RSSI in expected range |
| 8 | Sensor read | Take sample under known conditions | Within tolerance |
| 9 | Battery check | If powered by battery | Voltage in range |
| 10 | Final pass | All tests passed | Write production timestamp |
| 11 | Calibration | Per-unit cal | Sensor offset stored |
| 12 | Identity write | Serial, MAC, region | One-time programming (OTP) |
| 13 | MTM lockout | Disable test mode | Burn fuse or set flag |
2.3MTM access methods
- Special command via UARTProduction line connects to UART, sends magic sequence.
- Boot pin combinationHold specific buttons during boot.
- JTAG-onlyAccess via debugger; common for security-conscious products.
- Magic byte in NVMSet at first programming; cleared after MTM lockout.
2.4MTM access in shipping firmware
- Lockout after MTM completeProduction timestamp + lockout flag prevents re-entry.
- Recovery requires JTAGService technicians use JTAG to unlock for repair.
- Audit trailEach MTM session logs to NVM (which station, when, results).
2.5MTM design patterns
- Modular testsEach test is independent; can be re-run individually.
- Pass/fail reportingEach test outputs structured data (JSON, CSV) to UART for the production fixture to log.
- Per-station scriptsThe production fixture script knows which tests to run at which station.
- Failure captureOn failure, record waveforms, ADC values, register state for forensics.
3.OTA updates
Over-The-Air updates are mandatory for connected products. Architectural decisions made early determine OTA success rate.
3.1OTA update architecture
`` 1. Server ─────[notification]─────→ Device 2. Server ─────[image download]─────→ Device (chunks, resumable) 3. Device ─────[signature verify]───── 4. Device ─────[CRC verify]───── 5. Device ─────[stage in slot B]───── 6. Device ─────[reboot to slot B]───── 7. Device ─────[mark slot B as primary]───── 8. Server ─────[confirmation]───── ``
3.2OTA design rules
- Dual-slot storageAlways. Single-slot OTA has too high failure rate.
- Signature verificationCryptographic signature on every image. Reject unsigned or tampered images.
- Resumable downloadNetwork drops are common; resume from last chunk.
- Atomic switchNew image becomes active only after full download + verify.
- Rollback supportIf new image fails post-boot, automatically rollback to previous.
- ConfirmationDevice reports success/failure back to OTA server.
3.3OTA security
- Image signingRSA-2048 or ECDSA P-256 signature on image hash.
- Public key in deviceBurn during MTM; can't be changed by attacker.
- Encrypted images (optional)For products with proprietary firmware.
- Anti-rollbackTrack version; refuse to downgrade to known-vulnerable versions.
- Server-side authenticationDevice proves identity before receiving image (mTLS or per-device tokens).
3.4OTA frequency considerations
- Mandatory updatesSecurity patches; force install.
- Recommended updatesNew features; prompt user.
- Critical timingDon't update during user interaction (e.g., timer running, music playing).
- Battery thresholdDon't update below 30 % battery.
- Network conditionsPause on metered connections; resume on Wi-Fi.
3.5OTA scale considerations
- Staged rollout1 % → 5 % → 50 % → 100 % over 2 weeks. Catches failed updates before full deployment.
- Telemetry feedbackMonitor success rate, post-update crashes, battery drain changes.
- Server loadA million devices auto-checking weekly = significant traffic. Use CDN, randomised check times.
4.Security baseline
Embedded firmware security is increasingly mandatory (EU Cyber Resilience Act 2024, UK PSTI Act).
4.1Security requirements (EU CRA, UK PSTI Act 2024)
| Requirement | What it means |
|---|---|
| Unique credentials | No shared default passwords across units |
| Software update mechanism | Receive security patches |
| Vulnerability disclosure policy | Way to report bugs |
| Security update timeline | Patch within reasonable period |
| Secure default config | Default settings minimise risk |
4.2Hardware security features
- Secure Element (SE) or TPMStores cryptographic keys in tamper-resistant hardware.
- Secure bootCryptographic chain of trust from boot ROM to application.
- Read-out protectionPrevents reading firmware via JTAG.
- Anti-rollback fusesPermanent prevention of downgrade.
- Trusted Execution Environment (TEE)ARM TrustZone or similar; isolated execution.
4.3Firmware security baseline
- No hardcoded credentialsPer-device unique secrets, never shared across units.
- Encrypted storageSensitive data (Wi-Fi credentials, tokens) encrypted at rest.
- Encrypted communicationTLS 1.3 for cloud; encrypted BLE pairing.
- Crypto libraryValidated (FIPS 140-3, NIST-approved): mbedTLS, BoringSSL, wolfSSL.
- Random number generationHardware RNG; never reuse RNG state.
- Memory safetyAvoid C buffer overflows; consider Rust for new firmware.
4.4Common firmware vulnerabilities (OWASP IoT Top 10)
1. Weak, guessable, hardcoded passwords 2. Insecure network services 3. Insecure ecosystem interfaces (cloud, mobile, web) 4. Lack of secure update mechanism 5. Use of insecure or outdated components 6. Insufficient privacy protection 7. Insecure data transfer and storage 8. Lack of device management 9. Insecure default settings 10. Lack of physical hardening
5.Version control + release
Firmware is software. Apply software-engineering discipline.
5.1Repository structure
- One repo per product familyBranches per variant if hardware varies.
- CI/CD pipelineBuild, unit test, static analysis on every commit.
- Build reproducibilitySame source → same binary (deterministic build flags).
- Code review requiredNo direct commits to main branch.
5.2Release tagging
- Semantic versioning
MAJOR.MINOR.PATCH(e.g.,1.2.3). - Tag on production releaseTag in git matches version flashed to production.
- Release notesPer release, document changes, fixes, known issues.
- Reproducible build artifactHash matches firmware in production.
5.3Release process
1. Feature complete — All planned features merged to develop branch. 2. Code freeze — No new features; bug fixes only. 3. Regression testing — Run full test suite on hardware. 4. Release candidate (RC) — Tag, push to internal testers. 5. Field beta — Limited group of customers. 6. Production release — Tag, push to production server, MTM uses this image.
5.4Build flags
Different firmware images for different production tiers:
| Build | Purpose | Differences |
|---|---|---|
| Production | Shipping firmware | MTM disabled, debug stripped, optimised |
| Debug | Engineering | MTM enabled, debug symbols, less optimisation |
| Manufacturing | Factory only | MTM enabled, identity-write enabled |
| Engineering | Internal testing | Verbose logging, all debug |