Two traffic types
Discord carries text chat and live voice at once, and they have different needs. Chat is reliable and ordered, while voice is latency sensitive and tolerates a lost packet.
Voice through a media server
Rather than every participant sending audio to every other, each client sends one stream to a selective forwarding media server that relays it to the others. This keeps each client uploading once regardless of room size.
- Chat flows over a gateway websocket, reliable and ordered
- Voice flows to a media server that forwards streams
- Voice uses a real time transport that tolerates loss
Why separate them
Chat must not drop messages, so it uses reliable delivery. Voice must arrive fast, so it uses a protocol that prefers freshness over retransmission. Mixing them on one channel would force bad tradeoffs.
Each path is tuned to its tradeoff, reliability for chat and freshness for voice.
Key idea
Route reliable ordered chat through gateway sockets and latency sensitive voice through a forwarding media server, so each traffic type gets the delivery guarantees it needs.