When the host renders at a higher resolution than your client display, or when the network envelope clamps the stream down, the host can ship fewer pixels and let your device rebuild them. Remio's client runs a neural super-resolution model on the way to the display — text edges stay crisp, code stays readable, and UI chrome stops looking like a JPEG. The model executes on the device's NPU: Apple Neural Engine via Core ML on Apple Silicon, NNAPI on Android. Not in the cloud. Not on a remote server. On your device, frame by frame.
Classical video encoders are tuned for motion, not for the kind of fine, high-contrast edges that text and UI chrome are made of. When bandwidth tightens, those edges are the first thing to go.
A remote-desktop stream goes through two compression hits before it reaches your eyes. First, the host downscales when the source resolution exceeds what your client can display — an M4 MacBook Pro 14" hosting at 3024 by 1964 streamed to an iPad mini at 2266 by 1488 has already given up roughly a third of its pixels before the encoder sees the frame. Second, the encoder itself applies lossy compression to fit the bitrate the network can carry. Both steps throw away high-frequency detail. That detail is exactly what makes the difference between a crisp character glyph and a smudgy blob.
Neural super-resolution is a quality-recovery pass that runs on the client after decode. It looks at the decompressed frame and predicts the high-frequency content the original frame had before it was downscaled and squeezed through the encoder. Unlike classical bilinear or bicubic upscaling — which can only interpolate from the pixels already there — a trained neural network has learned what edges, letterforms, and UI shapes look like at full resolution. It puts the high-frequency information back.
The effect is most visible exactly where you need it most: 11-point body text, code in a monospaced font, hairlines in a Figma file, button outlines in a sidebar. These are the cases where a few smudged pixels mean the difference between "readable" and "I have to lean in." A motion-tuned encoder cannot win on these surfaces because they violate its core assumption that detail can be discarded between frames. A neural upscaler can, because it reconstructs from learned priors, not from neighbouring pixels.
The result is a stream that looks sharper than its bitrate would suggest. You can run a session at half the bandwidth and still read everything — useful on hotel Wi-Fi, on a tethered phone connection, on a saturated office LAN, or just on any network where you want headroom for other things.
Every frame, every pixel, every inference step stays inside the client app on your device. Nothing leaves for upscaling. Nothing is uploaded for "processing." The model weights ship in the app bundle.
On Apple Silicon — every M-series Mac, iPad Pro, iPad Air, and recent iPhone Pro — the model runs on the Neural Engine through Core ML. The Neural Engine is a dedicated coprocessor sitting next to the CPU and GPU on the same die. It is built for the small, repetitive matrix multiplications that neural networks spend most of their time on, and it does that work at a fraction of the wattage the GPU would use for the same operation. Core ML routes the compiled model onto the Neural Engine automatically; the client app simply hands a frame in and receives the upscaled frame back.
On older iOS devices that lack a Neural Engine fast enough for real-time work — pre-A12 chips — the model falls back to GPU compute through Metal Performance Shaders. The result is the same; the energy cost is higher. Most users on devices old enough to fall back to GPU will see the client decode the stream at 1:1 and skip the super-resolution pass entirely, because there is no quality win to be had on a display that already has fewer pixels than the stream.
On Android, the model runs through NNAPI on devices with a dedicated NPU: Pixel devices with Google Tensor (G1 and newer), Galaxy phones with Snapdragon 8 Gen 2 or newer, MediaTek Dimensity 9000 and newer. NNAPI is Android's neural-network abstraction layer; on supported devices it routes onto the hardware NPU automatically. On Android devices without a fast NPU, the client decodes 1:1 and skips the AI pass — same fallback model as iOS.
The point of running everything on-device is not philosophy. It is latency. A cloud round-trip for upscaling would add a minimum of 30 ms to glass-to-glass latency — longer if the path is congested. That is the entire latency budget for a remote-desktop session. Putting the upscaler on the device removes that hop entirely and keeps the inference inside the same process that owns the frame buffer. It is the only way the math works.
Super-resolution is a quality-recovery pass — running it on a stream that does not need it would burn battery for no visible benefit. The client decides per session whether the pass is worth running.
The first trigger is source-versus-display resolution. If the host is rendering at a meaningfully higher resolution than your client display can show — an M4 14" host at 3024 by 1964 streamed to an iPad mini at 2266 by 1488, or a 27" 5K iMac host streamed to a MacBook Air — the host has to downscale before encoding, and the downscaling discards detail that the upscaler can usefully reconstruct. The client checks this at session start and decides whether to enable the pass.
The second trigger is the WAN bitrate envelope. When you are on a LAN with the host on the same network, Remio runs the stream at the full bitrate the link can carry — there is no quality to recover, so the upscaler stays off. When you are on a wide-area connection — tethered, hotel Wi-Fi, cross-region — the envelope clamps the source to fit the network's sustained throughput. Sustained throughput drops; the encoder gives up high-frequency detail to hit its bitrate target; the upscaler turns on to put that detail back.
The third trigger is manual: a "Crisp text" mode in the Streaming preferences panel. Default is Auto, which combines the two automatic triggers above. You can force it Always On if you want maximum sharpness on every frame regardless of source, or Always Off if you want minimum battery and trust the raw stream. The toggle takes effect on the next decoded frame — no session restart, no reconnect.
When none of the three triggers fires, the client decodes the stream 1:1 and skips the AI pass entirely. The Neural Engine stays idle. Battery cost drops to zero. This is the common case on a LAN with a host display smaller than or equal to the client display, which is most of what a desk-bound user does in a day.
Glass-to-glass latency is the entire game in remote desktop. The super-resolution pass has to fit inside the budget that hardware decode and present leave behind — it does, with room to spare.
The upscaling pass runs in parallel with hardware decode rather than after it. The decoder hands frame N+1 to the GPU while the Neural Engine is still working on frame N's upscale. Because decode and inference are on different silicon, neither blocks the other — the only serialisation point is the final composite into the display surface, which is microseconds.
On a current-generation M-series Apple Silicon device, total NPU time per upscaled frame stays under 4 ms. On an iPad Pro M2 specifically, the measured budget sits around 2.5 ms per frame at 60 Hz, leaving the Neural Engine idle for the remaining 14 ms of each frame interval. That means the super-resolution pass adds essentially zero to the glass-to-glass latency budget — it is hidden inside time the GPU and display were already going to spend.
On Android NPU devices, the budget varies by chip generation. A Pixel 8 with Tensor G3 lands close to the iPad Pro figure; a Galaxy S22 with Snapdragon 8 Gen 1 runs longer but still inside the 16.6 ms per-frame budget at 60 Hz. On devices where the NPU cannot finish a frame inside the budget, the client backs off the pass automatically rather than dropping frames or stalling the display.
Generative AI marketing has set the expectation that "AI enhancement" means inventing detail. Remio's super-resolution model does not invent. It reconstructs. The distinction matters for trust.
It does not hallucinate missing UI. If a button is cut off at the edge of the stream because the encoder dropped a region, the upscaler does not invent the rest of the button. It works on the pixels that were already encoded and reconstructs the high-frequency detail those pixels imply — not pixels that were never there. A remote-desktop user needs to be able to trust that what they see on their iPad matches what is on the host. A generative model that fills in missing content would break that trust.
It does not increase frame rate. The stream still runs at 60 fps; the upscaler produces one upscaled frame per decoded frame. There is no temporal interpolation, no frame insertion, no motion smoothing. Frame rate is set by the host capture rate and the network's ability to deliver frames on time, and the upscaler does not change either.
It does not "AI-enhance" video playback that is running on the host. If you are watching a movie on your remote Mac, what gets streamed to your client is whatever the host's video decoder rendered into its window — at the host's resolution. The super-resolution pass on the client recovers detail that the streaming encoder discarded, but it does not improve on the source the host produced. For sharper movie playback, play the movie on your client device, not on the host.
It does not replace a good network. A LAN P2P connection with a fast direct link is always sharper than a clamped WAN connection upscaled by a neural net — super-resolution closes most of the gap but not all of it. The right way to think about the pass is as quality insurance: when the network is great, you do not need it and the client turns it off; when the network is bad, it keeps the session usable instead of letting it degrade into unreadable mush.
Five questions that come up about on-device super-resolution — honest answers below.
Install Remio on the computer you want to reach and on the device you want to reach it from. Super-resolution is on by default in Auto mode — it activates when your session benefits and stays out of the way the rest of the time. No accounts. No telemetry. No frame ever leaves your device for upscaling.
macOS, iOS, iPadOS, Windows, and Android. AI runs on your device.