Performance Benchmarks¶
This page documents the dispatch throughput benchmark policy and keeps the older PlayMode benchmark tables for broad historical context.
See also: Performance optimizations for design details.
Dispatch Throughput Methodology¶
The throughput harness measures raw dispatch cost for the current runtime. It is intentionally narrow: warm up each scenario, measure a 1-second window, repeat five times, and compare the median. Each row records:
- Scenario name.
- Platform identity, including editor/player target, scripting backend, architecture, build configuration, Unity platform, and Unity version. Common cells include Editor Mono x64, Standalone Mono x64, and Standalone IL2CPP x64.
- Commit SHA.
- Run index.
- Emits per second for dispatch scenarios.
GC.GetAllocatedBytesForCurrentThread()delta.- Wall-clock milliseconds for registration scenarios.
Run the runtime benchmark category from a Unity 2021.3 LTS or newer editor checkout:
Use the project-specific Unity executable path if unity is not on PATH. Keep the editor version, scripting backend, CPU governor, and machine load stable across before/after runs. Close the Unity editor UI before batchmode runs so the benchmark process owns the editor session.
The dispatch throughput scenarios cover these paths:
| Scenario | What it measures |
|---|---|
UntargetedFlood_OneHandler | One untargeted handler on one message type. |
UntargetedFlood_FourHandlers_OnePriority | Four untargeted handlers sharing priority 0. |
UntargetedFlood_FourHandlers_FourPriorities | Four untargeted handlers across priorities 0-3. |
TargetedFlood_OneListener | One targeted listener on one target. |
TargetedFlood_SixteenListeners | Sixteen targeted listeners on one target. |
BroadcastFlood_OneHandler | One broadcast handler. |
InterceptorHeavy_FourInterceptors | Four interceptors plus one handler. |
PostProcessingHeavy_FourPostProcessors | Four post-processors plus one handler. |
RegistrationFlood_1000Types_FromColdBus | Registering 1000 distinct message types from a cold bus. |
Baseline Capture¶
Capture baselines into a local CSV file and keep the file path explicit in the commands that consume it. Do not put generated baseline CSVs in package documentation or rely on a dated filename. For CI or release comparison, publish the CSV as a workflow artifact or attach it to the pull request that records the measurement.
Recommended commit cells:
| Commit reference | Purpose |
|---|---|
| Chosen comparison commit | Accepted baseline for regression gates. |
| Previous optimization landing | Runtime after the last relevant change. |
HEAD | Current branch result. |
Required configuration cells:
| Configuration | Requirement |
|---|---|
| Editor Mono | Required. |
| Standalone Mono x64 | Required when a Mono build host is available. |
| Standalone IL2CPP x64 | Stretch, required when CI has an IL2CPP build host. |
For each commit and configuration:
- Keep the benchmark harness available; older runtime commits may not contain the benchmark files.
- Measure the older runtime with a harness-preserving flow. Use a throwaway branch that cherry-picks the current harness onto the measured runtime commit, or keep the harness branch checked out and swap only the runtime files being measured.
- Set
DX_PERF_COMMIT=<measured-runtime-commit>for every benchmark run so CSV rows identify the runtime commit under test.DX_PERF_COMMIToverrides CI'sGITHUB_SHAwhen both are present. - Run the PlayMode
PerfBenchcategory in batchmode. - Extract the benchmark rows from the Unity output and append them to the local baseline CSV.
- Record the exact commit, platform, Unity version, and scripting backend.
The benchmark harness writes both structured log lines and CSV-shaped rows. Prefer the extractor so the baseline file has one header and normalized rows:
DX_PERF_COMMIT=<measured-runtime-commit> \
bash scripts/unity/run-tests.sh \
--platform playmode \
--include-perf \
--filter 'DxMessaging.Tests.Runtime.Benchmarks.DispatchThroughputBenchmarks.*' \
--results <results.xml> \
2>&1 | tee <unity-log.txt>
node scripts/unity/extract-perf-baseline.js \
--input <unity-log.txt> \
--input <results.xml> \
--output <baseline.csv> \
--append
For the first cell, omit --append so the script creates the file with the CSV header. For later cells, use --append. Repeat the run for each comparison commit and for HEAD, changing DX_PERF_COMMIT and artifact filenames so each cell is traceable.
Do not mix methodology changes with baseline updates. If the harness changes, capture a new baseline and make the old/new methodology boundary explicit in the PR description.
Budget Interpretation¶
Dispatch budgets are interpreted in per-emit terms. Convert throughput to nanoseconds per emit with:
Compare both throughput and per-emit nanoseconds. Throughput is easier to scan, but per-emit nanoseconds makes fixed overhead visible. A 10 ns increase is material on handlers whose work is only 10-20 ns.
Allocation budgets are interpreted as bytes allocated during the measured window. Dispatch scenarios should stay at zero measured bytes after warmup. Any non-zero allocation delta on a hot-path dispatch scenario requires an explanation, a fix, or an explicit reviewer-approved exception.
The opt-in smoke gate reads DX_PERF_BASELINE, requires a row for the configured comparison commit on the current scenario and platform identity, and fails when a within-platform regression exceeds the configured threshold. Enable the gate with:
DX_PERF_GATE=1 \
DX_PERF_BASELINE=<baseline.csv> \
DX_PERF_BASELINE_COMMIT=<baseline-commit> \
unity -batchmode -runTests -testPlatform editmode -testCategory "PerfGate"
The smoke gate is an EditMode test category. If the gate is enabled without DX_PERF_BASELINE or DX_PERF_BASELINE_COMMIT, or if the configured CSV is missing, it reports an inconclusive skip instead of failing the suite for a missing local artifact.
Hot-Path PR Rule¶
Any pull request that touches one of these paths must include before/after throughput numbers in the PR description under ### Performance numbers:
Runtime/Core/MessageBus/MessageBus.csRuntime/Core/MessageHandler.csRuntime/Core/Pooling/**
Use this shape:
### Performance numbers
| Scenario | Baseline run | This PR | Delta |
| ---------------------------------------- | ---------------- | ---------------- | ----- |
| UntargetedFlood_OneHandler (Mono Editor) | X.XX M emits/sec | Y.YY M emits/sec | +Z.Z% |
The workflow accepts either the table shape above with at least one populated data row or one of these one-line N/A forms:
N/A - refactor only <one-line justification>
N/A - non-hot-path edit only <one-line description>
The justification or description must be on the same line as the N/A marker. Bare N/A, empty sections, and template-only comments do not satisfy the workflow gate.
Historical PlayMode Benchmarks¶
The sections below are auto-updated by the Unity PlayMode benchmark tests in the Performance PlayMode benchmark suite.
How it works:
- Run PlayMode tests locally in your Unity project that references this package.
- The benchmark test writes an OS-specific section below with a markdown table.
- CI runs skip writing to avoid noisy diffs.
Benchmark Methodology and Caveats¶
These older benchmarks measure raw message dispatch throughput using a simple counter-increment handler. Each test runs for 5 seconds, dispatching messages in batches of 10,000 operations per iteration with a pre-warm phase to avoid cold-start effects.
Important considerations¶
- Results will vary based on your hardware, Unity version, and runtime environment.
- The benchmarks test isolated message dispatch with minimal handler logic. Real-world performance depends heavily on what your handlers actually do.
- The "Unity" baseline uses
GameObject.SendMessage(), which performs string-based method lookup and allocates memory. Direct method calls would be faster than any messaging system. - "Allocations?" indicates whether the test detected GC allocations during message dispatch under test conditions.
Performance tradeoffs to be aware of¶
- Interceptors and post-processors add overhead. With 8 interceptors registered, throughput drops to roughly 45% of the no-interceptor baseline. With 8 post-processors, throughput drops to roughly 38%. This is an expected tradeoff for the additional flexibility these features provide.
- Reflexive messaging (dynamic method invocation) is slower than direct handler registration due to the reflection overhead.
You can run these benchmarks yourself to get results specific to your environment. The source code is available in the test suite linked above.
Windows¶
| Message Tech | Operations / Second | Allocations? |
|---|---|---|
| Unity | 2,568,246 | Yes |
| DxMessaging (GameObject) - Normal | 9,854,653 | No |
| DxMessaging (Component) - Normal | 9,902,081 | No |
| DxMessaging (GameObject) - No-Copy | 11,074,789 | No |
| DxMessaging (Component) - No-Copy | 8,424,722 | No |
| DxMessaging (Untargeted) - No-Copy | 16,976,111 | No |
| DxMessaging (Untargeted) - Interceptors | 7,232,103 | No |
| DxMessaging (Untargeted) - Post-Processors | 6,779,454 | No |
| Reflexive (One Argument) | 2,706,648 | No |
| Reflexive (Two Arguments) | 2,188,005 | No |
| Reflexive (Three Arguments) | 2,187,457 | No |
macOS¶
Run the PlayMode benchmarks on macOS to populate this section.
Linux¶
Run the PlayMode benchmarks on Linux to populate this section.
Memory footprint and reclamation¶
Dispatch state is stored per message type and, for targeted and broadcast paths, per InstanceId. Long-running sessions accumulate slots for every type or entity ever touched unless something reclaims them. The memory reclamation system caps that growth without changing dispatch semantics or allocating during emit.
Reclamation runs on two paths:
- An idle sweep that runs from emit-time clock samples and the Unity PlayerLoop, gated by
DxMessagingRuntimeSettings.EvictionEnabledandEvictionTickIntervalSeconds. Empty slots become eligible only after remaining empty for at leastIdleEvictionSecondsof wall time. - An explicit
IMessageBus.Trim(force)andMessageHandler.TrimAll(force)pair that runs synchronously at scene boundaries, in tests, or in maintenance windows. The master switchEnableTrimApicontrols whether the explicit calls perform work; idle sweeps remain controlled byEvictionEnabledindependently.
Active registrations are never reclaimed. Only empty slots and shared pool entries are touched. Sweep work runs outside the hot handler loop, so emit throughput is unaffected; the per-emit overhead is one branch that samples the wall clock.
For tuning recommendations, the public Trim and diagnostic-counter API surface, and worked examples (scene transitions, leak diagnosis, mobile caps, shipped-title configurations), see the Memory Reclamation guide. For the parameter reference, see the Runtime Settings reference.