Actually putting the net in netplay
Today’s post is about the networking part of netplay and not the playing part.
TCP and UDP and you
TCP: In order, stream-oriented, reliable delivery. Great for anything that resembles a stream, horrible for anything that doesn’t. The built-in reliability is great, but comes at a very high cost: if you have some packets A and B in the pipeline, you must receive packet A before receiving packet B (a.k.a. head-of-line blocking), with no way to receive B and buffer it ahead of time. It also has some weird gotcha behavior on by default for real-time applications, like Nagle’s algorithm.
TCP is a bit of a non-starter, since we want to send packets at a constant 60 Hz and head-of-line blocking can be catastrophic since we’re prevented from sending the next packet until we get the ACK back all the way from the other side even though we could totally send it already, so that’s one whole round trip required to send the next packet. Not great.
UDP: Out of order, message-oriented, unreliable delivery. Pretty good for realtime data if you can afford to lose it, otherwise you’ll have to build your own reliability mechanism on top. Once you’ve done all that, it’s pretty good and basically the gold standard for implementing a real-time netplay application protocol on top of. ENet is a popular library that handles the fiddly reliability and ordering bits.
UDP is pretty good and we can put some elbow grease in to build our own reliability mechanism or just crib off ENet. A great option through and through but in the years that have passed since the advent of ENet…
Friendship ended with UDP now SCTP is my best friend
Ah, SCTP. It’s in order. It’s message-oriented. It has customizable reliable delivery semantics. The absolute goldilocks protocol for our use case. While it delivers messages in order, it doesn’t suffer from head of line blocking: you can send as many messages as you want up to the buffer size and the other side will buffer them, rather than completely pausing the connection. It has built-in reliable delivery so you don’t have to build your own.
It’s perfect. While it’s a relatively uncommon protocol with only a few use cases, there’s one major use case that it goes together with like peanut butter and jelly.
WebRTC is my other best friend
WebRTC is a browser technology that actually combines a whole bunch of protocols to allow two clients to connect peer-to-peer with each other to e.g. stream video, audio, or send data.
STUN/TURN/ICE: The magic sauce that tries to punch holes through NAT, basically removing the need to port forward 90% of the time. For the 10%, TURN proxies the connection. Tailscale has a fantastic article about how all this works: give it a read!
SCTP over DTLS (over UDP): The holy grail networking protocol we want to use. Ironically, at the bottom it’s all UDP since a lot of networking hardware will ignore SCTP over IP, but that’s OK because we want the rest of WebRTC anyway.
… and some other stuff that we don’t happen to care about (mostly around video/audio streaming).
Peer-to-peer or client—server?
A lot of people have misconceptions about the nature of peer-to-peer vs client—server. I’ve chosen to go with a peer-to-peer model which is a much better experience for players compared to client—server: let’s go through some examples to see why.
Let’s buy a dirt cheap instance from EC2 in AWS’s cheapest zone, us-east-1.
Detroit Daniel and Atlanta Anna want to play against each other. Detroit Daniel has ~10 ms latency to us-east-1 and Atlanta Anna has ~24 ms latency. So far so good: the packets end up traveling through scenic Dublin, Ohio and continue on their merry way to either Atlanta or Detroit. Great.
Now, Sacramento Sharon and Los Angeles Liam want to play against each other. It takes ~65 ms for a packet to get from both Sacramento and Los Angeles to Dublin, so we end up with a round trip time of ~130 ms. In contrast, a direct connection from Sacramento to Los Angeles is ~20 ms. Yikes. We can buy an instance in San Francisco, I guess, but the price is getting up there.
Finally, Auckland Adam and Wellington Wanda want to play against each other. It takes a whopping ~250 ms for a packet to get from Auckland to Dublin, and a whopping ~300 ms for a packet to get from Wellington to Dublin. That’s a ~550 ms round trip time, which is 17 ticks we’ll need to simulate ahead without input delay — that’s completely unplayable! In contrast, a direct connection from Wellington to Auckland is ~1 ms: almost instantaneous. Now we’re having to buy an instance in New Zealand to avoid the whopping ~550 ms penalty!
Instead of all this client—server business, we can instead rely on WebRTC’s peer-to-peer connectivity to just connect players directly to each other in most cases: it’s a better experience for players and a better experience for my wallet, because I don’t have to proxy their traffic to each other and run these instances! For the cases where the connection has some kind of cursed firewall, we’ll need TURN relay servers to take care of this. Fortunately, getting managed TURN servers is a lot easier than hosting game servers in multiple locations: both Twilio and Xirsys offer plans for this.
The cheese under the sauce
Now that we know how to connect our clients and send packets what do we put in the packets? From our previous exploration of the game code, we’ve found that every tick the game sends 16 bytes of outgoing input data to the other side and consumes 16 bytes of incoming input data.
And that’s it! We send those bytes every tick and wait for the other side to come in, going back to the committed state every so often as required. From what we talked about last time, we already know how to deal with all of this in terms of what we need to do with this received input data!
This ends this series on how Tango works. There’s been a lot of handwaving and the devil is truly in the details, but I hope you’ve managed to get a good feel for how the magic happens under the hood. Thanks for reading!