NAT Traversal Explained: How P2P Remote Desktop Reaches Through Your Router
You have one public IP and a dozen devices behind your router. Your iPad is on cellular, three networks away. Somehow, a remote desktop app opens a direct connection from one to the other in under a second — usually with no relay in the middle. This is how that magic actually works: ICE, STUN, TURN, hole punching, and the awkward truth about symmetric NAT.
Why this matters
Picture a Tuesday morning. Your Mac mini is at home behind a router with a single public IPv4 address. Your iPad is on the train, on a carrier's mobile network with its own layers of address translation. At least two NATs sit between the two devices, probably three. Neither machine is reachable from the outside Internet by default.
And yet you open Remio on the iPad, tap the Mac, and a second later you are looking at your desktop. The video and input flow directly between the two devices — no cloud relay, no proxy in the middle re-encoding your screen.
The technique that makes this possible has a name — NAT traversal — and a family of well-defined protocols behind it: STUN, TURN, and ICE. These are the same protocols WebRTC uses for browser video calls, and they are why any modern P2P remote desktop can avoid forcing your screen frames through a third party's data center. This article walks through what each one does, why it exists, and how they combine to find the lowest-latency path between two devices hiding behind routers.
The NAT problem in 60 seconds
The Internet was designed so every device had its own publicly routable address. There are about 4.3 billion IPv4 addresses in the entire space — plenty in 1981, very obviously not enough by the late 1990s.
The workaround was Network Address Translation. Your router gets one public IPv4 address from your ISP. Inside your home, every device gets a private address from a reserved range (typically 192.168.x.x or 10.x.x.x). When your laptop sends a packet out, the router rewrites the source address to its own public IP, remembers the mapping, and forwards. When the reply comes back, the router looks up the mapping and rewrites the destination back to your laptop.
This is exactly why incoming connections don't work by default. From outside, your laptop has no address; only the router does. A stranger's packet to your-router-public-ip:5060 hits a router with no mapping for it and gets dropped.
For a P2P app, this is the entire game. Both peers sit behind routers that only forward packets when they remember sending something outbound first. Neither side can simply "call" the other — they have to cooperate with their routers to open a temporary path.
Hole punching: making the firewall let you back in
Here is the trick. Your router's NAT table is keyed on a five-tuple: source IP, source port, destination IP, destination port, and protocol. When your device sends an outbound UDP packet to 1.2.3.4:5678, the router creates a temporary entry saying "if any UDP packet comes back from 1.2.3.4:5678 to my public IP on port X, forward it to this device on port Y."
That entry exists for a window of time — typically 30 to 60 seconds — and then expires unless something refreshes it. During that window, any packet from the destination address will be accepted.
This is the keyhole P2P apps use. If both peers know each other's public address, they can each send a single outbound UDP packet to the other at the same time. Each side's router opens an outbound mapping. Each side's first packet may well be dropped — the other router hasn't opened its mapping yet — but the second packet has somewhere to go. From that moment on, both routers think they are forwarding a normal outbound conversation, and the two devices are talking directly.
This is called UDP hole punching. It is mildly miraculous, well-documented (RFC 5128), and works for the vast majority of consumer NATs in the wild. The remaining cases — symmetric NAT, CGNAT, corporate firewalls — are why TURN exists at all.
Hole punching is not a hack. It is the explicit contract NAT was designed around: outbound packet creates inbound permission. P2P apps just send the outbound packet on purpose.
STUN: "what's my public address?"
Hole punching needs one prerequisite: each peer has to know its own public-facing address so it can tell the other side where to send packets. That address is not visible from inside the device. Your iPad sees its local Wi-Fi address (192.168.1.42). What the rest of the Internet sees after the router rewrites the packet is something completely different.
STUN — Session Traversal Utilities for NAT, RFC 5389 — is the tiny protocol that solves this. The flow is almost embarrassingly simple:
- The peer sends a UDP packet to a public STUN server. (Remio uses both Cloudflare's and Google's public STUN servers for redundancy.)
- The STUN server reads the source IP and port off the wire — which is, by definition, the peer's NAT-translated public address.
- The STUN server replies with that address as the payload: "As far as I can see, you are
203.0.113.42:48291." - The peer now knows what address to advertise to the other side.
That public-facing address is called a server-reflexive candidate in ICE terminology. It costs essentially nothing — one round trip during connection setup, and zero bandwidth or latency overhead after the connection is established. STUN's only job is to introduce two peers to each other; once they are talking, the STUN server has no further role.
TURN: when hole punching fails
Hole punching works for most NATs, but not all. The failure cases are real, and a P2P app that pretends they don't exist will simply fail to connect for a meaningful slice of users.
The main villains:
- Symmetric NAT. A NAT that assigns a different public port for every distinct destination. STUN tells your peer you are on port 48291 (because that's the port the STUN server saw), but when you send a packet to your peer's address, your router picks a different port — say 48317. Your peer's router rejects the packet because it's coming from an unexpected port. Hole punching breaks.
- Carrier-grade NAT (CGNAT). Cellular and many ISP networks layer a second NAT at the carrier level. You might be behind two NATs without knowing it, and both have to cooperate. CGNAT is often symmetric, and you have no admin access to it.
- Restrictive corporate firewalls. Some enterprise networks block all outbound UDP, or only allow it to specific destinations. Hole punching needs a UDP path; if there is none, it has no chance.
For these cases, the fallback is TURN — Traversal Using Relays around NAT, RFC 5766. A TURN server is a relay that both peers can reach because it has a public address that always accepts inbound traffic. Each peer makes a normal outbound TLS connection to the TURN server (which any NAT will allow), and the server forwards bytes between the two sides.
TURN is a real cost. Each packet of video has to traverse an extra network hop. On a regional TURN relay, that adds about 30 to 80 milliseconds of round-trip latency. Cross-region — say a relay in Frankfurt for a peer in Tokyo and a peer in São Paulo — can be 80 to 200 milliseconds. The TURN operator also pays for bandwidth: every bit of your stream goes through their pipes.
This is why a well-designed app uses TURN only when hole punching has demonstrably failed. ICE handles that decision automatically.
ICE: trying every candidate in parallel
Interactive Connectivity Establishment, RFC 8445, is the choreography that ties all of this together. ICE doesn't pick a strategy upfront. It gathers every possible candidate address for each peer, exchanges the full list with the other side, then has both stacks probe every candidate pair in parallel — racing to find the first one that works.
For each peer, ICE typically gathers:
- Host candidates — every local IP address on every network interface (Ethernet, Wi-Fi, virtual interfaces, IPv4 and IPv6).
- Server-reflexive candidates — public addresses learned from STUN servers.
- Relay candidates — addresses allocated on a TURN server.
The two peers exchange these lists through a signaling channel — for Remio, that's a small WebSocket connection to relay.remio.net that carries the ICE candidates and the WebRTC session description (the SDP), and then has nothing further to do once media starts flowing.
Once both lists are in hand, ICE starts connectivity checks. Each peer sends a STUN-binding-request packet along every candidate pair, in priority order. The priorities are designed to prefer the lowest-latency path:
- Host to host — both peers on the same local network. This is the dream case: a few hundred microseconds of latency, no router involved.
- Host to server-reflexive, or server-reflexive to server-reflexive — peers on different networks, hole-punched through their respective NATs. This is what "P2P over the Internet" usually means in practice.
- Anything involving a relay — last resort, used only when the higher-priority paths all fail their connectivity checks.
The first candidate pair where both directions succeed is nominated as the winner, and media starts flowing along that path. The whole process typically completes in well under a second.
ICE also gathers candidates continually. If a better path appears mid-session — say, your phone leaves cellular and joins Wi-Fi — ICE can renominate to the new path without dropping the call.
Why Remio prefers P2P, hard
Remio is a remote desktop app, not a video call. The latency budget is brutal. For a session to feel like you are sitting at the host machine, the entire pipeline — capture, encode, network, decode, render — has to fit inside about 16 milliseconds (one frame at 60 FPS). Every extra hop is a sin.
On a direct LAN P2P connection, the network portion of that budget is about 1 to 5 milliseconds round-trip. There is no relay; the bytes go straight from one device to the other. With Remio's pipeline this lets us hit sub-5 ms glass-to-glass latency end-to-end. That is below the threshold humans can perceive — the remote machine feels indistinguishable from a local one.
On a TURN-relayed connection — even a regional one — the network portion jumps to 30 to 80 milliseconds. The total user-perceived latency is now 40 to 100 ms. Still usable for most work, but you can feel it on a fast drag or a rapid click sequence. The trackpad doesn't quite feel like yours anymore.
That delta is why we go to so much trouble to keep sessions on P2P. Remio's WebRTC stack gathers both IPv4 and IPv6 candidates, talks to multiple STUN servers in parallel, and aggressively prefers direct paths during ICE's candidate-pair selection. Internally we even use the measured connection RTT to classify the network — under 15 ms we know it's LAN, set the encoder to 8 Mbps CBR with up to 30 Mbps headroom for keyframes, and seed the bandwidth estimator high. Over 15 ms is treated as WAN and tuned more conservatively.
And — this is the part competitors often quietly omit — the connection is end-to-end encrypted regardless of the path. AES-256-GCM keys are negotiated via ECDHE over Curve25519 directly between the two peers. Whether the bytes flow P2P or through TURN, neither Remio's signaling server nor the TURN operator can read them. P2P just means there is no third party at all on the data path.
When relay is unavoidable (and that is OK)
Sometimes the network gods say no. Symmetric NAT on both ends, double-CGNAT on a cellular carrier with a strict ISP, a corporate firewall that allows only outbound TLS to known cloud providers — these are real, common configurations. In those cases the session simply has to go through a relay or it doesn't happen at all.
Remio's TURN backend is Cloudflare Calls, which gives us a global network of relays close to most users. When ICE selects a relay path, the experience degrades from "feels local" to "feels like Splashtop" — still very usable, but you can tell. The added latency lands at 30 to 80 ms for a regional relay; cross-region can push 200 ms.
The important part: the data is still end-to-end encrypted. Cloudflare sees encrypted UDP packets going one way and encrypted UDP packets going the other way. It can count bytes for billing. It cannot see your screen, your keystrokes, or any of your session content. There is no Remio-side decryption point either, because there is no Remio cloud in the pipeline — we operate the TURN service through Cloudflare and never touch the payloads.
This is the architecture our security whitepaper describes in more detail. The TL;DR: P2P or relay, the threat model is the same.
The IPv6 escape hatch
IPv6 has 340 undecillion addresses (3.4 × 1038) — enough to give every grain of sand on Earth its own public IP, with a few trillion to spare. In an IPv6-only world, NAT is largely unnecessary. Two IPv6-capable devices can just send packets to each other's addresses, and the only obstacle is the host firewall on either end.
If you have IPv6 at home and on your cellular carrier, you may have noticed that some apps connect faster when both ends are IPv6 — that's because the whole "gather candidates, probe pairs" dance has fewer obstacles. There is no NAT translation to discover, no STUN trip required, just direct addresses.
But here's the catch: in 2026 the global rollout is still incomplete. Many home routers do IPv6 poorly or not at all. Many corporate networks block inbound IPv6 by default. Some carriers offer IPv6 on the radio side but bridge to IPv4 elsewhere. Pure IPv6-to-IPv6 paths exist but cannot be relied on as the only mechanism.
So Remio — like every serious P2P stack — gathers both IPv4 and IPv6 candidates, and lets ICE race them. If a direct IPv6 path exists and works, it usually wins on latency. If not, IPv4 with hole punching takes over. If both fail, relay. The user never sees any of this; the connection just establishes.
How to know which path your session is using
Curious which path your session ended up on? Remio's connection indicator shows the type in plain language — "Direct" for P2P, "Relay" for TURN. The developer stats panel exposes the full WebRTC state, including the nominated pair's candidate types (host, srflx, relay) and measured RTT.
A few quick interpretations:
- RTT under 5 ms, Direct — you're on the same LAN as the host. Best case. Sub-5 ms glass-to-glass is achievable.
- RTT 5 to 15 ms, Direct — same Wi-Fi network but with an extra access-point or mesh hop. Still excellent.
- RTT 15 to 80 ms, Direct — you're on a different network from the host, hole punching succeeded. Internet P2P is working.
- RTT 30 to 80 ms, Relay (regional) — symmetric NAT or carrier-grade NAT defeated the hole punch. Cloudflare TURN nearby took over.
- RTT 80 to 200 ms, Relay — TURN over a longer geographic distance, or you and the host are in very different regions.
If you reliably fall back to relay even at home, the usual suspects are an ISP modem doing symmetric NAT, IPv4 CGNAT on your home connection (more common than you'd think on fiber resellers), or an over-zealous firewall blocking inbound UDP. Our guide to reducing remote desktop lag walks through the common culprits, and the direct LAN connection page is the fast path if both devices are on your network.
Real-world performance numbers
Putting it all together, here are the latency figures we measure in production across the different paths a Remio session can take:
- Direct LAN P2P: 1–5 ms network RTT. End-to-end (input event to pixel update) under 5 ms with the rest of the pipeline cooperating.
- Wi-Fi same network with extra hops: 5–10 ms network RTT.
- TURN relay, regional: 30–80 ms network RTT.
- TURN relay, cross-region: 80–200 ms network RTT.
- 4G/5G cellular, hole-punched P2P: 40–100 ms network RTT.
These match the figures in Remio's benchmarks and explain why we set the LAN-vs-WAN threshold at 15 ms RTT — it cleanly separates direct paths from any relay (30 ms minimum) or cellular path (40 ms minimum), with very few false positives.
NAT traversal is one of those subjects where the protocols are simple individually but the combinatorics are gnarly. Most users will never need to know ICE, STUN, and TURN exist — they will just experience an app that connects fast from anywhere to anywhere. The protocols are doing their job when you don't notice them. When you do notice — a session takes a beat too long, or "Direct" turns into "Relay" — you now have the mental model.
For more on Remio's native pipeline and the choices that turn 1–5 ms of network latency into sub-5 ms glass-to-glass, see why native matters. For the threat model on relay paths, see the security whitepaper. Or just browse the features and download Remio to feel it work.