Want to stream a program in HD with just the sound of said program and your mic to a small group of friends, with low-latency, and nothing but opensource software? Here's WIP solution that works on linux
TL;DR Use WebRTC in a mesh/P2P configuration, share your screen, and use pulseaudio's
Before I start, here are few terms:
All peers know of each other and directly connected to each other in a P2P fashion. As expected, this is the most expensive for the peers as they need to send out their stream to multiple peers and also have enough bandwidth to receive the streams from other peers.
WebRTC Multipoint Control Unit (MFU) and Selective Forwarding Unit (SFU)
In an MFU setting, all peers stream to a server, the server knits the images and audio together to then send it out to the peers. All peers have one input and one output.
In an SFU setting, all peers stream to the server, the server replicates the incoming streams to those participants that want it. All peers have multiple inputs and one output.
See webrtc.ventures for a good description.
Now, from the beginning. You can skip to the solution at the bottom.
Along the road to this, there were many attempts that I'll just document here. Hopefully not in long-form... let's see.
Share screen to browser
So, this is the most obvious solution, right? Open up Jitsi, Signal, or some other closed-source solution like Google Meet, Microsoft Teams (🤮), Zoom, etc., share your screen or game window aaaand - nope.
The input sure is 1080p, but what comes out on the other side? 720p or worse; especially when there's movement.
The reason for this is that they use WebRTC in an MTU or SFU (Selective Forwarding Unit) setting. It's expensive for the server and especially if you aren't a paying client, it makes sense to limit the quality in order to save on bandwidth + give incentive to pay to stream HD.
But not only is the quality a problem, it's also a strain on your GPU. If you have a shitty one, which is highly likely if you just have a standard mac or an equivalent mid-range laptop with an onboard GPU, you'll probably drop frames, your whole PC will struggle, fans will spin, and the viewer EXP will be dog.
Get an HDMI capture card and stream that through aforementioned services
Soo, yeah. I didn't know that the services were throttling or limiting quality, therefore, of course this was bound to fail.
Use HDMI capture card to stream through Nextcloud Talk
Nextcloud talk uses a WebRTC mesh setting --> all peers are P2P!
It seemed like the best solution, but if you're running it on docker (like I am), with
nginx (like I am),
don't know the intricacies of PHP (guess who!), and believe the "just install" marketing (oh yeah, me)...
you're in for a great time.
Let me introduce you to Multiple polling of new messages API calls slows down the whole nextcloud instance #6738. No disrespect to the developers of Nextcloud Talk, honestly, this really shouldn't be an issue. Long-polling a chatroom to know who's in it, shouldn't require an advanced configuration and in-depth optimization of your server.
After a single peer joined, Nextcloud nearly ground to a halt and couldn't add any more participants.
Stream to Owncast, call through another app
Owncast is a nice streaming solution, if you just want to viewers and async feedback through a text chat. It uses the omnipresent protocol used by the most popular streaming services Twitch, Youtube, Facebook, Periscope, et al.:
Real-Time Message Protocol (RTMP)
It uses the term "Real Time" veeery loosely as latencies between 5-20 seconds are very normal. Wowza has a great article on protocols and latency.
After one attempt, it was clear that this was not the answer. Listening to me react to an event and seeing the event 5 seconds later is not acceptable.
Not to beat around the bush: https://p2p.chat (Github) solved the video streaming problem for me. Under the hood it's a WebRTC mesh. Since I have good upstream bandwidth, streaming to multiple people in HD isn't a problem (2Mb/s per person).
Since p2p.chat runs in a browser, it doesn't have advanced access to audio input (just like Jitsi, Teams, Zoom, etc.). That means, if I want only the game audio + my mic, I can't just let it access my desktop audio, because that would mean my friends hearing themselves.
Pulse audio has "modules" that are basically extensions to pulseaudio that you one can load or unload dynamically. Sources are inputs and sinks are outputs.
null-sinkis basically a faux sink (like virtual, fake speaker)
combine-sinkacts as a sink that forwards the audio to other sinks
remap-sourcecreates a source from a sink
loopbacksimply duplicates the packets from a source and send them to a sink - no extra program necessary
With these four modules we can:
- Route the audio of as many mics into a
- Create a new source out of the
- Use the new source as the input to the browser
- Send the game audio to both the
null-sinkand the physical speakers using a
A quick overview of what I'm aiming for.
As a script it looks like this
#!/bin/bash # Find the name of the speaker with `pactl list sinks` pactl load-module module-null-sink sink_name=Virtual-Speaker sink_properties=device.description=Virtual-Speaker pactl load-module module-loopback sink=Virtual-Speaker pactl load-module module-loopback sink=Virtual-Speaker pactl load-module module-remap-source source_name=Remap-Source master=Virtual-Speaker.monitor pactl load-module module-combine-sink sink_name=Splitter slaves=alsa_output.pci-0000_0e_00.6.analog-stereo,Virtual-Speaker
It would be good if there were a program that allowed making this audio graph easily and visualizing the flow of audio. A project for another time....