Hardware Video Encoders Compared: NVENC vs AMF vs QuickSync vs Apple VideoToolbox
Four vendors. Four blocks of fixed-function silicon. One job — turn a frame of pixels into a few kilobytes of H.265 or AV1 in under five milliseconds, sixty times a second, without dropping any. Here is how NVIDIA, AMD, Intel, and Apple actually compare for low-latency streaming, and how Remio picks the right one at runtime.
Why hardware encoders matter for remote desktop
Encode a single 1080p frame with libx264 on a fast CPU core. On a modern Apple M-series or a Ryzen 7000, you can hit roughly 8 to 12 milliseconds for the ultrafast preset, longer for anything that tries to preserve quality. That sounds tolerable until you remember the budget: 16.6 milliseconds per frame at 60 FPS. Now add capture, packetisation, pacer queueing, network egress, decode, and render. CPU encoding does not just slow you down — it eats the entire latency budget, and on anything beyond 1080p60 it drops frames.
Hardware encoders solve this by dedicating fixed-function silicon to the same job. NVIDIA calls its block NVENC. AMD calls theirs AMF (more precisely the VCN engine that AMF drives). Intel calls theirs QuickSync (the Media Engine inside their iGPU). Apple calls theirs VideoToolbox (the API; the silicon is the Media Engine on M-series and A-series chips). All four exist for one reason: software encoding cannot meet the latency and frame-rate targets that modern screens, modern GPUs, and modern users now expect.
For Remio specifically, every millisecond saved at the encoder is a millisecond the user gets back at the pacer, the network, and the renderer. Our published latency budget targets 8 to 20 ms LAN glass-to-glass; encode time is the single largest variable. The choice of encoder, and the configuration of that encoder, drives everything downstream.
The four vendors at a glance
Before we go deep on each, it helps to understand what we are actually comparing. These are not interchangeable libraries — they are physical blocks on a die, each with its own ISA, its own driver stack, its own quirks, and its own multi-year roadmap. They differ in what codecs they support, what quality they produce, what latency they hit, how much power they draw, and on which platforms they exist at all.
- NVENC ships on every NVIDIA GeForce, Quadro, and RTX since 2012 (Kepler). Generations are tied to architecture families: 1st gen on Kepler, 8th gen on Ada Lovelace. Hopper datacenter parts have a different cadence.
- AMF exposes AMD's VCN (Video Core Next) engine, which replaced the older VCE. Available on every Vega, RDNA, RDNA2, and RDNA3 GPU, plus newer APUs.
- QuickSync has shipped in Intel iGPUs since Sandy Bridge in 2011 — making it the most broadly deployed hardware encoder on Earth. Arc discrete GPUs use the same architecture, scaled up.
- VideoToolbox is the Apple API; the silicon underneath is the Media Engine, baked into every Apple Silicon Mac, every iPhone since the A6, and every iPad. M-series chips ship one or two engines depending on the SKU.
NVENC — NVIDIA's eight generations of refinement
NVENC is the encoder most streaming software is benchmarked against, partly because NVIDIA has the marketing budget and partly because the silicon really has been excellent for a decade. The block has gone through eight generations:
- 1st-3rd gen (Kepler through Maxwell, 2012-2014) — H.264 only, viable for streaming but quality lagged libx264 by a clear margin.
- 4th gen (Pascal, GTX 10-series) — added H.265 encode. This is the generation where NVENC quality caught up to and arguably surpassed libx264 fast presets at the same bitrate.
- 6th gen (Turing, RTX 20-series, GTX 16-series) — the often-cited "Turing NVENC" leap. Quality at low bitrates improved enough that streamers stopped pretending CPU encoding was meaningfully better for live use cases.
- 7th gen (Ampere, RTX 30-series) — refinements, broadly equivalent quality to Turing per watt.
- 8th gen (Ada Lovelace, RTX 40-series) — first NVIDIA generation with hardware AV1 encode, plus dual encoder engines on higher-tier SKUs. Ada NVENC can do AV1 at quality competitive with libaom-av1 medium presets at a tiny fraction of the CPU cost.
For low-latency use, NVENC exposes three relevant rate-control modes. CBR with low-latency tuning and the p1 preset (fastest) is the canonical configuration: deterministic output rate, single-frame lookahead, no B-frames. Remio uses exactly this when running over hevc_nvenc on Windows. Encode time on Ada Lovelace at 1080p60 H.265 is consistently 1 to 3 ms.
One historical gotcha: consumer NVENC was for years limited to three concurrent encode sessions via NVIDIA's driver. That limit was lifted in 2023 to eight, then effectively removed. If you are reading a stale forum thread that warns about it, the warning no longer applies.
AMF — AMD's underdog that finally caught up
AMD's AMF (Advanced Media Framework) is the API; the underlying engine has gone through two distinct architectures. The older VCE (Video Coding Engine) shipped on GCN-era cards from 2012 and was the perennial weak spot in head-to-head encoder comparisons — visibly worse quality than NVENC at matched bitrate, especially on motion-heavy content. AMD knew it, the streaming community knew it, and benchmarks confirmed it.
That story changed with VCN (Video Core Next), introduced on Vega and refined through every subsequent generation:
- VCN 1.0-2.0 (Vega, RDNA1) — H.264 and H.265 with quality that finally got within a hair of NVENC Pascal/Turing.
- VCN 3.0 (RDNA2, RX 6000 series) — significant quality improvements, particularly at low bitrates. The first AMD generation that I am comfortable recommending for game streaming.
- VCN 4.0 (RDNA3, RX 7000 series) — adds AV1 encode in parallel with NVIDIA Ada. Two encoders on higher SKUs for parallel sessions. AV1 quality is competitive with NVIDIA's AV1 in independent benchmarks.
For low-latency streaming, AMF exposes a ultralowlatency usage preset, a speed quality preset, and a cbr rate-control mode. Remio uses hevc_amf with all three — the same preset triplet documented in our streaming architecture notes. Encode time on RDNA3 at 1080p60 H.265 typically sits at 2 to 4 ms. RDNA2 is around 3 to 5 ms. Older VCN 1.x-era cards run closer to 5 to 7 ms — still inside the budget, but with less headroom for keyframe spikes.
For five years AMD's encoder was a punchline. RDNA2 fixed it. RDNA3 made it competitive on AV1. The streaming-encoder duopoly is now a triopoly, and that is good for everyone.
QuickSync — the silent ubiquity champion
QuickSync is the encoder you almost certainly already own. It ships in every Intel CPU with integrated graphics since Sandy Bridge in 2011 — that is over a decade of laptops, NUCs, and office desktops with hardware video encoding sitting unused because the user assumed they needed a discrete GPU. They did not.
The architecture has progressed in step with Intel's iGPU lineage:
- Sandy Bridge-Skylake (2011-2015) — H.264 only, middling quality, but class-leading latency and power for a laptop.
- Kaby Lake-Coffee Lake (2016-2018) — added H.265, 10-bit, 4K encode.
- Ice Lake onward (2019+) — Gen11/Gen12 Xe iGPU dramatically improved quality. Many benchmarks rank Gen12 QuickSync within a few percent of NVENC Turing.
- Arc discrete (2022+) — Intel's first discrete GPU lineup. Same Xe-HPG architecture scaled up, with AV1 encode. Arc was the first consumer GPU to ship hardware AV1 — beating NVIDIA Ada and AMD RDNA3 to market — and the quality is surprisingly excellent. An Intel Arc A380 remains an absurd bargain for pure AV1 throughput per dollar.
The thermal profile is QuickSync's quiet superpower. Because it lives inside an iGPU already on the SoC, the marginal cost of using it is the encode block alone — no waking a discrete GPU, no firing up GDDR6 memory controllers. On a ThinkPad or NUC, QuickSync delivers comparable quality to NVENC at roughly a quarter of the system-level power. For Remio's Windows host, QuickSync is the third-choice fallback after NVENC and AMF — not because it is worse, but because users with discrete GPUs usually want their iGPU free for other workloads.
VideoToolbox — Apple Silicon's unified Media Engine
Apple does not market a brand name for their hardware encoder the way NVIDIA does. The API is VideoToolbox; the silicon is the Media Engine, an Apple-designed block that lives inside every M-series Mac and every iPhone/iPad SoC since the A6.
The Media Engine ships in different counts depending on the SKU:
- M1, M2, M3 — one Media Engine.
- M1 Pro/Max, M2 Pro/Max, M3 Pro/Max — two Media Engines.
- M1 Ultra, M2 Ultra — four Media Engines (two per fused die).
- M3 Max (16-core CPU) — two Media Engines.
- M4 family — refreshed Media Engine with improved H.265 throughput.
Each Media Engine handles H.264, H.265, and ProRes encode/decode in hardware. ProRes encode is the headline feature for video editors — no other consumer chip ships it — but for streaming the relevant capabilities are H.265 main10 4:4:4 support (Remio's preferred mode for color-critical work) and astonishingly low encode latency.
The one notable gap: no AV1 encode on any shipping Apple Silicon. M-series chips decode AV1 since M3, but cannot encode it. Streaming from a Mac, H.265 is the practical default for the foreseeable future.
Where Apple genuinely wins is power efficiency. On an M2 Pro encoding 4K60 H.265 at 30 Mbps, the entire Media Engine draws roughly 3 to 5 watts. The equivalent NVENC workload on an RTX 4070 draws closer to 35 watts of system-level power — most of which is the discrete GPU's idle floor, not the encoder block. For a Mac mini host that sits powered-on 24/7 waiting for a client, this is the difference between "barely noticeable on the electric bill" and "actively noticeable."
Side-by-side comparison
Here are the four engines on the same axes. Latency numbers are 1080p60 H.265 ultra-low-latency presets, B-frames disabled, single-frame VBV — the Remio configuration. Quality is a perceptual rough-rank at 8 Mbps CBR; absolute differences are small.
| Axis | NVENC (Ada) | AMF (RDNA3) | QuickSync (Arc / Gen12+) | VideoToolbox (M3+) |
|---|---|---|---|---|
| H.264 | Yes | Yes | Yes | Yes |
| H.265 (HEVC) | Yes | Yes | Yes | Yes (main10 4:4:4) |
| AV1 encode | Yes | Yes | Yes | No |
| ProRes encode | No | No | No | Yes |
| Max resolution | 8K H.265 | 8K H.265 | 8K (Arc), 4K (iGPU) | 8K H.265 |
| Max FPS at 1080p H.265 | ~480 | ~360 | ~360 | ~600 (M3 Max) |
| Encode time, 1080p60 H.265 | 1-3 ms | 2-4 ms | 2-4 ms | 2-3 ms |
| Power draw, 4K60 H.265 | ~35 W system | ~30 W system | ~8 W system (iGPU) | ~5 W |
| Quality rank @ 8 Mbps CBR | Tied 1st | Tied 2nd | Tied 2nd | Tied 1st |
| Platforms | Windows, Linux | Windows, Linux | Windows, Linux, macOS (Intel Macs) | macOS, iOS, iPadOS, tvOS |
| Concurrent sessions | Unlimited | Unlimited | Unlimited | Unlimited |
The honest summary: at the settings remote-desktop streaming actually uses (CBR, low-latency, no B-frames), all four are good enough by a comfortable margin. The differentiators are AV1 support (NVIDIA / AMD / Intel — not Apple), power efficiency (Apple, then Intel iGPU, then everyone else), and platform availability (you do not get to choose VideoToolbox on a Windows host or NVENC on a Mac).
What Remio picks at runtime
Remio probes the host's capabilities at startup and picks an encoder by deterministic priority. The actual selection lives in GpuCapabilities on the C++/WinRT host, mirrored in Swift on the Mac host. Both follow the same logic.
On Windows hosts:
- NVENC (
hevc_nvenc/h264_nvenc) — picked first when an NVIDIA GPU is present and the driver responds. Preferred because it is the most consistently low-latency encoder across the broadest range of hardware generations. - AMF (
hevc_amf/h264_amf) — picked second when an AMD discrete GPU or APU with VCN is present. Configured withusage=ultralowlatency,quality=speed,rc=cbr. - QuickSync (
hevc_qsv/h264_qsv) — picked third when an Intel iGPU is present. Functionally equivalent in our pipeline; we prefer leaving the iGPU free if a discrete GPU is available. - libx264 CPU fallback — picked only when no hardware encoder responds. Capped at 720p60 or 1080p30 to stay inside the latency budget.
On Apple hosts: VideoToolbox is the only option, and the only one we want. The Media Engine inside every supported Mac (macOS 15+) handles H.265 main10 4:4:4 out of the box. No fallback path needed.
Codec preference is H.265 first, H.264 fallback, and AV1 only when both ends of the connection have hardware AV1 (Ada / RDNA3 / Arc encoder on the host; Apple Silicon M3+ / Snapdragon 8 Gen 2+ / RTX 40-series decoder on the client). For LAN sessions we keep H.265 even on AV1-capable pairs — encode latency is the bottleneck, not bitrate. AV1 only earns its keep on TURN-relayed or constrained WAN sessions where bandwidth is genuinely the limit.
Quality vs latency trade-offs
Every hardware encoder exposes the same handful of knobs, and every knob trades quality for latency in the same direction. The Remio configuration sits at the most aggressive end of the latency dial because remote desktop is, by definition, a real-time application where a frame older than 20 ms is worse than a frame slightly blurrier than ideal.
- CBR vs VBR. Variable bitrate produces better quality at the same average rate by spending bits where they help. But VBR also produces unpredictable per-frame size, which means unpredictable pacer queueing, which means unpredictable latency. Remio uses CBR everywhere. The quality cost is small; the latency consistency is huge.
- B-frames. B-frames reference both past and future frames, so the decoder must wait for the future frame to arrive before decoding the B-frame. That is fine for video-on-demand and fatal for real-time. Remio disables B-frames on every codec on every platform.
- GOP / keyframe interval. Longer GOPs (more frames between keyframes) compress better. Shorter GOPs recover faster from packet loss. Remio runs effectively infinite GOPs and relies on Picture Loss Indication (PLI) to request a fresh keyframe when the decoder reports loss. This matches our streaming philosophy of "skip lost frames, render latest only" rather than retransmit old data.
- Lookahead. NVENC, AMF, and QuickSync all support multi-frame lookahead for adaptive quantization. Each frame of lookahead adds one frame of latency. Remio runs zero or single-frame lookahead — every encoder's lowest setting.
- VBV buffer. The Video Buffering Verifier window is how many bits the encoder is allowed to "borrow forward" before settling back to the target rate. Larger VBV gives smoother quality. Smaller VBV gives tighter rate control and lower latency. Remio uses 1 frame of VBV — the bitrate divided by the frame rate — which is the smallest legal value.
The pattern is consistent across all four vendors. Real-time low-latency is a constrained optimisation: you give up the few percent of quality that lookahead and B-frames buy in offline encoding, in exchange for the predictable single-frame latency that real-time interaction demands. Every shipping streaming product — Remio, Parsec, NVIDIA GeForce NOW, Xbox Cloud Gaming — makes the same trade in the same direction.
What you (the user) actually need to care about
If you skipped to the end: any modern GPU's hardware encoder is fine for remote desktop.
The differences at real-time configurations are smaller than the noise floor of any reasonable benchmark. A Pascal GTX 1060 user gets the same Remio experience as an RTX 4090 user — both saturate the 60 FPS pipeline at sub-frame latency, both produce indistinguishable output at the 8 Mbps LAN uses.
The exceptions, in order of how often they matter:
- AV1-capable hardware helps on slow networks. Hotel Wi-Fi or mobile hotspot, an Ada / RDNA3 / Arc host with M3+ or RTX 40-series client buys ~25 percent more headroom at the same quality. On LAN, nothing.
- Apple Silicon Macs are remarkably good hosts. The Media Engine's low latency, excellent H.265, and 3-5 W draw make a Mac mini or Studio one of the best always-on streaming sources you can buy. No AV1 encode is the only gap, irrelevant on LAN.
- An Intel Arc A380 is an unreasonable bargain for AV1. ~$120 buys an encoder block that beats anything else under $400.
- The libx264 CPU fallback is real. No hardware encoder — old laptop, server without iGPU — Remio still works. Just not at 4K, not at 120 FPS, not without burning a CPU core.
Beyond those edge cases, the right answer is the simplest one: use the GPU you would buy anyway. Every modern vendor has converged on the same capabilities, the same low-latency presets, the same quality envelope. Remio picks the right one at runtime so you do not have to think about it. Get on with whatever you opened a remote desktop session for — code, edit a video, fix a server, push a fix from a coffee shop.
That is the whole point of fixed-function silicon. It just works, and you stop noticing it.