Which hardware encoder has the lowest latency?

All four — NVENC, AMF, QuickSync, and Apple VideoToolbox — can deliver under 5 milliseconds of encode time per frame at 1080p60 once configured for low-latency mode with B-frames disabled and a 1-frame VBV buffer. Apple VideoToolbox on M-series silicon and NVENC on Ada Lovelace are the tightest in practice, both routinely hitting 2 to 3 milliseconds for 1080p60 H.265 in our measurements.

Is AV1 hardware encoding worth using yet?

AV1 hardware encoding gives roughly a 25 to 30 percent bitrate reduction at the same perceived quality versus H.265, and a 50 percent reduction versus H.264. On constrained networks (slow Wi-Fi, mobile data) it is meaningful. On LAN it makes no practical difference because the bottleneck is encode latency, not bandwidth. Remio ships hardware H.265 end to end today, with H.264 as the universal fallback — AV1 isn't in the pipeline yet, even on hardware that supports it.

Does Apple Silicon really compete with discrete GPU encoders?

Yes, and often wins on energy efficiency. Apple's unified Media Engine on M2 and later encodes 4K60 H.265 at roughly 3 to 5 watts, well under a tenth of a discrete-GPU NVENC engine doing the same work. Quality is competitive with NVENC at matched bitrate, and latency is comparable. Apple does not yet ship AV1 encode, which is the main gap.

What encoder does Remio use on Windows?

Remio auto-detects in the order NVENC, then AMF, then QuickSync, then CPU fallback (libx264). It prefers H.265 hevc_amf or hevc_nvenc with the ultralowlatency preset, B-frames disabled, a 1-frame VBV buffer, and 8 Mbps CBR adaptive up to 30 Mbps on LAN. The same configuration philosophy applies to all three hardware paths — only the FFmpeg codec name and a few vendor-specific preset strings change.

Why does Remio prefer H.265 over H.264 by default?

H.265 reaches the same perceived quality at roughly half the bitrate of H.264, which matters for keyframe spikes and TURN-relayed sessions. Hardware H.265 encode and decode are now standard across every modern Apple, Intel, AMD, and NVIDIA chip Remio targets. H.264 remains the universal fallback for very old hardware and for the few WebRTC paths where H.265 negotiation fails.

Does it matter which hardware encoder I have for remote desktop?

For most users, no. Any hardware encoder shipped in the last six years — Pascal-era NVENC, VCN 2.0+ AMF, Skylake or newer QuickSync, any Apple Silicon Mac — will saturate Remio's 60 FPS pipeline at about 8 ms of processing on LAN. The differences are at the margins: AV1 on slow networks, energy draw on laptops, and quality at very low bitrates. Pick the GPU you would buy anyway and the encoder will be fine.

NVENC vs AMF vs QuickSync vs VideoToolbox — Remio Remote Desktop Blog

Four vendors. Four blocks of fixed-function silicon. One job — turn a frame of pixels into a few kilobytes of H.265 or AV1 in under five milliseconds, sixty times a second, without dropping any. Here is how NVIDIA, AMD, Intel, and Apple actually compare for low-latency streaming, and how Remio picks the right one at runtime.

Why hardware encoders matter for remote desktop

Encode a single 1080p frame with libx264 on a fast CPU core. On a modern Apple M-series or a Ryzen 7000, you can hit roughly 8 to 12 milliseconds for the ultrafast preset, longer for anything that tries to preserve quality. That sounds tolerable until you remember the budget: 16.6 milliseconds per frame at 60 FPS. Now add capture, packetisation, pacer queueing, network egress, decode, and render. CPU encoding does not just slow you down — it eats the entire latency budget, and on anything beyond 1080p60 it drops frames.

Hardware encoders solve this by dedicating fixed-function silicon to the same job. NVIDIA calls its block NVENC. AMD calls theirs AMF (more precisely the VCN engine that AMF drives). Intel calls theirs QuickSync (the Media Engine inside their iGPU). Apple calls theirs VideoToolbox (the API; the silicon is the Media Engine on M-series and A-series chips). All four exist for one reason: software encoding cannot meet the latency and frame-rate targets that modern screens, modern GPUs, and modern users now expect.

For Remio specifically, every millisecond saved at the encoder is a millisecond the user gets back at the pacer, the network, and the renderer. Our published latency budget targets about 8 ms of processing on LAN (the rest is your display's own 60 Hz refresh cadence) — encode time is the single largest variable. The choice of encoder, and the configuration of that encoder, drives everything downstream.

The four vendors at a glance

Before we go deep on each, it helps to understand what we are actually comparing. These are not interchangeable libraries — they are physical blocks on a die, each with its own ISA, its own driver stack, its own quirks, and its own multi-year roadmap. They differ in what codecs they support, what quality they produce, what latency they hit, how much power they draw, and on which platforms they exist at all.

NVENC ships on every NVIDIA GeForce, Quadro, and RTX since 2012 (Kepler). Generations are tied to architecture families: 1st gen on Kepler, 8th gen on Ada Lovelace. Hopper datacenter parts have a different cadence.
AMF exposes AMD's VCN (Video Core Next) engine, which replaced the older VCE. Available on every Vega, RDNA, RDNA2, and RDNA3 GPU, plus newer APUs.
QuickSync has shipped in Intel iGPUs since Sandy Bridge in 2011 — making it the most broadly deployed hardware encoder on Earth. Arc discrete GPUs use the same architecture, scaled up.
VideoToolbox is the Apple API; the silicon underneath is the Media Engine, baked into every Apple Silicon Mac, every iPhone since the A6, and every iPad. M-series chips ship one or two engines depending on the SKU.

NVENC — NVIDIA's eight generations of refinement

NVENC is the encoder most streaming software is benchmarked against, partly because NVIDIA has the marketing budget and partly because the silicon really has been excellent for a decade. The block has gone through eight generations:

1st-3rd gen (Kepler through Maxwell, 2012-2014) — H.264 only, viable for streaming but quality lagged libx264 by a clear margin.
4th gen (Pascal, GTX 10-series) — added H.265 encode. This is the generation where NVENC quality caught up to and arguably surpassed libx264 fast presets at the same bitrate.
6th gen (Turing, RTX 20-series, GTX 16-series) — the often-cited "Turing NVENC" leap. Quality at low bitrates improved enough that streamers stopped pretending CPU encoding was meaningfully better for live use cases.
7th gen (Ampere, RTX 30-series) — refinements, broadly equivalent quality to Turing per watt.
8th gen (Ada Lovelace, RTX 40-series) — first NVIDIA generation with hardware AV1 encode, plus dual encoder engines on higher-tier SKUs. Ada NVENC can do AV1 at quality competitive with libaom-av1 medium presets at a tiny fraction of the CPU cost.

For low-latency use, NVENC exposes three relevant rate-control modes. CBR with low-latency tuning and the p1 preset (fastest) is the canonical configuration: deterministic output rate, single-frame lookahead, no B-frames. Remio uses exactly this when running over hevc_nvenc on Windows. Encode time on Ada Lovelace at 1080p60 H.265 is consistently 1 to 3 ms.

One historical gotcha: consumer NVENC was for years limited to three concurrent encode sessions via NVIDIA's driver. That limit was lifted in 2023 to eight, then effectively removed. If you are reading a stale forum thread that warns about it, the warning no longer applies.

AMF — AMD's underdog that finally caught up

AMD's AMF (Advanced Media Framework) is the API; the underlying engine has gone through two distinct architectures. The older VCE (Video Coding Engine) shipped on GCN-era cards from 2012 and was the perennial weak spot in head-to-head encoder comparisons — visibly worse quality than NVENC at matched bitrate, especially on motion-heavy content. AMD knew it, the streaming community knew it, and benchmarks confirmed it.

That story changed with VCN (Video Core Next), introduced on Vega and refined through every subsequent generation:

VCN 1.0-2.0 (Vega, RDNA1) — H.264 and H.265 with quality that finally got within a hair of NVENC Pascal/Turing.
VCN 3.0 (RDNA2, RX 6000 series) — significant quality improvements, particularly at low bitrates. The first AMD generation that I am comfortable recommending for game streaming.
VCN 4.0 (RDNA3, RX 7000 series) — adds AV1 encode in parallel with NVIDIA Ada. Two encoders on higher SKUs for parallel sessions. AV1 quality is competitive with NVIDIA's AV1 in independent benchmarks.

For low-latency streaming, AMF exposes a ultralowlatency usage preset, a speed quality preset, and a cbr rate-control mode. Remio uses hevc_amf with all three — the same preset triplet documented in our streaming architecture notes. Encode time on RDNA3 at 1080p60 H.265 typically sits at 2 to 4 ms. RDNA2 is around 3 to 5 ms. Older VCN 1.x-era cards run closer to 5 to 7 ms — still inside the budget, but with less headroom for keyframe spikes.

For five years AMD's encoder was a punchline. RDNA2 fixed it. RDNA3 made it competitive on AV1. The streaming-encoder duopoly is now a triopoly, and that is good for everyone.

QuickSync — the silent ubiquity champion

QuickSync is the encoder you almost certainly already own. It ships in every Intel CPU with integrated graphics since Sandy Bridge in 2011 — that is over a decade of laptops, NUCs, and office desktops with hardware video encoding sitting unused because the user assumed they needed a discrete GPU. They did not.

The architecture has progressed in step with Intel's iGPU lineage:

Sandy Bridge-Skylake (2011-2015) — H.264 only, middling quality, but class-leading latency and power for a laptop.
Kaby Lake-Coffee Lake (2016-2018) — added H.265, 10-bit, 4K encode.
Ice Lake onward (2019+) — Gen11/Gen12 Xe iGPU dramatically improved quality. Many benchmarks rank Gen12 QuickSync within a few percent of NVENC Turing.
Arc discrete (2022+) — Intel's first discrete GPU lineup. Same Xe-HPG architecture scaled up, with AV1 encode. Arc was the first consumer GPU to ship hardware AV1 — beating NVIDIA Ada and AMD RDNA3 to market — and the quality is surprisingly excellent. An Intel Arc A380 remains an absurd bargain for pure AV1 throughput per dollar.

The thermal profile is QuickSync's quiet superpower. Because it lives inside an iGPU already on the SoC, the marginal cost of using it is the encode block alone — no waking a discrete GPU, no firing up GDDR6 memory controllers. On a ThinkPad or NUC, QuickSync delivers comparable quality to NVENC at roughly a quarter of the system-level power. For Remio's Windows host, QuickSync is the third-choice fallback after NVENC and AMF — not because it is worse, but because users with discrete GPUs usually want their iGPU free for other workloads.

VideoToolbox — Apple Silicon's unified Media Engine

Apple does not market a brand name for their hardware encoder the way NVIDIA does. The API is VideoToolbox; the silicon is the Media Engine, an Apple-designed block that lives inside every M-series Mac and every iPhone/iPad SoC since the A6.

The Media Engine ships in different counts depending on the SKU:

M1, M2, M3 — one Media Engine.
M1 Pro/Max, M2 Pro/Max, M3 Pro/Max — two Media Engines.
M1 Ultra, M2 Ultra — four Media Engines (two per fused die).
M3 Max (16-core CPU) — two Media Engines.
M4 family — refreshed Media Engine with improved H.265 throughput.

Each Media Engine handles H.264, H.265, and ProRes encode/decode in hardware. ProRes encode is the headline feature for video editors — no other consumer chip ships it — but for streaming the relevant capabilities are efficient H.265 main10 encode and astonishingly low encode latency.

The one notable gap: no AV1 encode on any shipping Apple Silicon. M-series chips decode AV1 since M3, but cannot encode it. Streaming from a Mac, H.265 is the practical default for the foreseeable future.

Where Apple genuinely wins is power efficiency. On an M2 Pro encoding 4K60 H.265 at 30 Mbps, the entire Media Engine draws roughly 3 to 5 watts. The equivalent NVENC workload on an RTX 4070 draws closer to 35 watts of system-level power — most of which is the discrete GPU's idle floor, not the encoder block. For a Mac mini host that sits powered-on 24/7 waiting for a client, this is the difference between "barely noticeable on the electric bill" and "actively noticeable."

Side-by-side comparison

Here are the four engines on the same axes. Latency numbers are 1080p60 H.265 ultra-low-latency presets, B-frames disabled, single-frame VBV — the Remio configuration. Quality is a perceptual rough-rank at 8 Mbps CBR; absolute differences are small.

Axis	NVENC (Ada)	AMF (RDNA3)	QuickSync (Arc / Gen12+)	VideoToolbox (M3+)
H.264	Yes	Yes	Yes	Yes
H.265 (HEVC)	Yes	Yes	Yes	Yes (main10)
AV1 encode	Yes	Yes	Yes	No
ProRes encode	No	No	No	Yes
Max resolution	8K H.265	8K H.265	8K (Arc), 4K (iGPU)	8K H.265
Max FPS at 1080p H.265	~480	~360	~360	~600 (M3 Max)
Encode time, 1080p60 H.265	1-3 ms	2-4 ms	2-4 ms	2-3 ms
Power draw, 4K60 H.265	~35 W system	~30 W system	~8 W system (iGPU)	~5 W
Quality rank @ 8 Mbps CBR	Tied 1st	Tied 2nd	Tied 2nd	Tied 1st
Platforms	Windows, Linux	Windows, Linux	Windows, Linux, macOS (Intel Macs)	macOS, iOS, iPadOS, tvOS
Concurrent sessions	Unlimited	Unlimited	Unlimited	Unlimited

The honest summary: at the settings remote-desktop streaming actually uses (CBR, low-latency, no B-frames), all four are good enough by a comfortable margin. The differentiators are AV1 support (NVIDIA / AMD / Intel — not Apple), power efficiency (Apple, then Intel iGPU, then everyone else), and platform availability (you do not get to choose VideoToolbox on a Windows host or NVENC on a Mac).

What Remio picks at runtime

Remio probes the host's capabilities at startup and picks an encoder by deterministic priority. The actual selection lives in GpuCapabilities on the C++/WinRT host, mirrored in Swift on the Mac host. Both follow the same logic.

On Windows hosts:

NVENC (hevc_nvenc / h264_nvenc) — picked first when an NVIDIA GPU is present and the driver responds. Preferred because it is the most consistently low-latency encoder across the broadest range of hardware generations.
AMF (hevc_amf / h264_amf) — picked second when an AMD discrete GPU or APU with VCN is present. Configured with usage=ultralowlatency, quality=speed, rc=cbr.
QuickSync (hevc_qsv / h264_qsv) — picked third when an Intel iGPU is present. Functionally equivalent in our pipeline; we prefer leaving the iGPU free if a discrete GPU is available.
libx264 CPU fallback — picked only when no hardware encoder responds. Capped at 720p60 or 1080p30 to stay inside the latency budget.

On Apple hosts: VideoToolbox is the only option, and the only one we want. The Media Engine inside every supported Mac (macOS 15+) handles H.265 main10 out of the box. No fallback path needed.

Codec preference is hardware H.265 first, H.264 as the universal fallback — end to end, on every platform. AV1 support across the whole matrix (host encode, client decode, WebRTC negotiation) is on the roadmap, not in the pipeline today; Remio holds to H.265 even on hardware that could technically do AV1, because encode latency is the bottleneck on LAN, not bitrate.

Quality vs latency trade-offs

Every hardware encoder exposes the same handful of knobs, and every knob trades quality for latency in the same direction. The Remio configuration sits at the most aggressive end of the latency dial because remote desktop is, by definition, a real-time application where a frame older than 20 ms is worse than a frame slightly blurrier than ideal.

CBR vs VBR. Variable bitrate produces better quality at the same average rate by spending bits where they help. But VBR also produces unpredictable per-frame size, which means unpredictable pacer queueing, which means unpredictable latency. Remio uses CBR everywhere. The quality cost is small; the latency consistency is huge.
B-frames. B-frames reference both past and future frames, so the decoder must wait for the future frame to arrive before decoding the B-frame. That is fine for video-on-demand and fatal for real-time. Remio disables B-frames on every codec on every platform.
GOP / keyframe interval. Longer GOPs (more frames between keyframes) compress better. Shorter GOPs recover faster from packet loss. Remio runs effectively infinite GOPs and relies on Picture Loss Indication (PLI) to request a fresh keyframe when the decoder reports loss. This matches our streaming philosophy of "skip lost frames, render latest only" rather than retransmit old data.
Lookahead. NVENC, AMF, and QuickSync all support multi-frame lookahead for adaptive quantization. Each frame of lookahead adds a full frame of delay. Remio runs zero or single-frame lookahead — every encoder's lowest setting.
VBV buffer. The Video Buffering Verifier window is how many bits the encoder is allowed to "borrow forward" before settling back to the target rate. Larger VBV gives smoother quality. Smaller VBV gives tighter rate control and lower latency. Remio uses 1 frame of VBV — the bitrate divided by the frame rate — which is the smallest legal value.

The pattern is consistent across all four vendors. Real-time low-latency is a constrained optimisation: you give up the few percent of quality that lookahead and B-frames buy in offline encoding, in exchange for the predictable single-frame latency that real-time interaction demands. Every shipping streaming product — Remio, Parsec, NVIDIA GeForce NOW, Xbox Cloud Gaming — makes the same trade in the same direction.

What you (the user) actually need to care about

If you skipped to the end: any modern GPU's hardware encoder is fine for remote desktop.

The differences at real-time configurations are smaller than the noise floor of any reasonable benchmark. A Pascal GTX 1060 user gets the same Remio experience as an RTX 4090 user — both saturate the 60 FPS pipeline at about 8 ms of processing latency, both produce indistinguishable output at the 8 Mbps LAN uses.

The exceptions, in order of how often they matter:

AV1-capable hardware helps on slow networks, generally. Hotel Wi-Fi or mobile hotspot, an Ada / RDNA3 / Arc host with M3+ or RTX 40-series client buys ~25 percent more headroom at the same quality on tools that use it. On LAN, nothing. Remio doesn't run AV1 yet — H.265 end to end is the whole pipeline — so today this is a vendor advantage, not a Remio one.
Apple Silicon Macs are remarkably good hosts. The Media Engine's low latency, excellent H.265, and 3-5 W draw make a Mac mini or Studio one of the best always-on streaming sources you can buy. No AV1 encode is the only gap, irrelevant on LAN.
An Intel Arc A380 is an unreasonable bargain for AV1. ~$120 buys an encoder block that beats anything else under $400.
The libx264 CPU fallback is real. No hardware encoder — old laptop, server without iGPU — Remio still works. Just not at 4K 60, not without burning a CPU core.

Beyond those edge cases, the right answer is the simplest one: use the GPU you would buy anyway. Every modern vendor has converged on the same capabilities, the same low-latency presets, the same quality envelope. Remio picks the right one at runtime so you do not have to think about it. Get on with whatever you opened a remote desktop session for — code, edit a video, fix a server, push a fix from a coffee shop.

That is the whole point of fixed-function silicon. It just works, and you stop noticing it.

Hardware video encoders compared: NVENC vs AMF vs QuickSync vs Apple VideoToolbox.