Lowering (→ducking) Spotify/… volume when somebody speaks on Zoom/Discord/…

Question

Do you have any idea how I could achieve aforementioned result with a reasonable amount of time and effort?

I've spent the past few hours researching and trying different approaches.

module-role-ducking seemed like a hot candidate. However, Discord does not specify the media.role property, so if I can't set that myself, it won't work. Even if adding the property did work, from my understanding, my music would be permanently ducked, as these voice chat clients don't pause/disconnect when nobody speaks.
I had a look at spectrumyzer to see if I could intercept the sound output of the voice chat application, so I could do it all by myself (calculate voice chat volume, set music volume accordingly), but I'm not up to that task – I didn't find any helpful guides. This approach would, AFAIK, also involve creating another sink for just the voice chat application.
I could follow this approach to pipe the voice chat audio to a custom script that does the things mentioned in 2. But how would I process this raw audio information?
pulseeffects is great, but it doesn't allow such interactions between applications
Mumble VoIP does the actual ducking-of-other-applications part, so that is absolutely doable.

Nicolai Weitkemper · Answer 1 · 2020-10-27T19:47:26.727

It's still in an early phase, but I made a working prototype of what I wanted!

https://github.com/NicoWeio/PulseDucking

I'll paste the How it's done section from my README, for future reference.

For each currently running trigger application, a new thread is started.
In each thread, parec --monitor-stream=<STREAM_INDEX> is called. It streams the raw audio of an application.
By simply checking for 0x00, silence/noise is detected.
pacmd set-sink-input-volume <SINK_INPUT_INDEX> <VOLUME> is dispatched for all ducking applications.
A loop ensures that new applications are taken into account as well.

So, as it turns out, detecting absolute silence in raw audio streams is not difficult at all, and getting the raw audio from a single application does not involve creating sinks/loopbacks/…, thanks to parec --monitor-stream=<STREAM_INDEX>.

Regarding 1., manually setting the stream's properties is possible using pacmd update-sink-input-proplist <INDEX> media.role="…". I have yet to check whether module-role-ducking works, then.

I wrote a little script to manual ducking any process working as an input for a PulseAudio sink — Pablo Bianchi, Aug 15 '22 at 21:52

Lowering (→ducking) Spotify/… volume when somebody speaks on Zoom/Discord/…

1 Answers1