Assume that my microphone audio is being live streamed to a Zoom app (could be any application in general)
I want to run this audio through an executable (a
filter
) before being sent to the application. Naively, this piping mechanism:raw input | filter | application
.I need to maintain a buffer in my
filter
, assume around 50ms (because the audio preprocessing is dynamic), and only then flush that to the application, like how I/O works in general.Let's say I am coding this in C++ (although Python would be preferred, since a bit of ML is involved in the audio preprocessing).
(In effect, the application would be receiving a 50ms delayed audio from my end, but practically this does not matter for me.)
How to achieve this? Since this is quite a broad field, I want to specifically know a way to do the following:
- capture raw audio data (from - assume - a headset)
- process the raw audio buffer using a C++ executable (preferably Python, though) which maintains it in a 50ms buffer, and then
- flushes the post-processed buffer to a live stream application (like Zoom, etc.)
I have seen the noise removal with pulseaudio post post, but my desired audio processing is much more general than just using a built-in pulseaudio module or profiling noise using sox.