The smaller the buffer, the smaller the latency and
- the higher the chances of buffer issues for single-clock + PC processing
- or the shorter the intervals (100% certainty) between buffer issues for two clock domains without adaptive resampling.
A typical chain would be:
alsa capture device -> capture buffer (CB) -> DSP (one or more threads) -> playback buffer (PB) -> alsa playback device
My 2 cents the WiiM devices have the same architecture, as it's pretty much the only logical setup in linux.
It's important to keep in mind that alsa device sets the pace of the client processing, be it capture or playback. The kernel driver wakes up to user-space process when fresh samples are available for reading by the userspace (capture) or when output buffer has gained enough room for new samples to be written by the userspace (playback).
For analog input, the capture device (ADC) and playback device (DAC or SPDIF out) are clocked by same clock. Low-cost devices use clock signals generated by internal PLLs of the SoC (i.e. the SoC I2S interfaces run in master mode), more expensive devices have external clock circuits, running the SoC I2S interfaces in slave mode.
If I were to design a low-cost device with that linkplay A98 module which has two I2S interfaces, I would probably do:
- ADC as slave -> I2S_A input (alsa capture device A) as master
- I2S_A (alsa playback device A) output as master -> DAC slave/SPDIF_OUT as slave
- SPDIF_IN as master (SPDIF stream always carries master clock) -> I2S_B input (alsa capture device B) as slave
This setup does not need any external clock, no clock switching.
Now for analog input -> DSP -> output there is only one clock involved (the I2S master clock generated by the SoC), both alsa capture and playback devices A (clocked by the same clock) produce/consume samples at the same rate, buffers CB and PB are happy and can be kept quite small.
When switching from the ADC analog input to SPDIF the DSP must start capturing from the SPDIF capture device B, clocked by the SPDIF clock entering I2S_B. In this case the two alsa devices will not run at the same speed, the SPDIF_IN capture device will provide data at a different rate than the playback device will consume, and buffers CB and PB will eventually start having issues.
Typically the processing chains are pulled, i.e. it's the playback device which sets pace of the whole DSP. If so, then:
- if playback is faster than capture, CB will eventually underflow (missing capture samples)
- if playback is slower than capture, CB will eventually overflow (dropped capture samples)
To avoid that, e.g. CamillaDSP puts an adaptive resampler between CB and the DSP thread, which consumes samples at rate of the capture device B, and produces samples at rate of the playback device A. Of course determining the correct current resampling ratio is crucial and not simple, especially if the overall latency is to be small, i.e. buffers must be kept small and the extra room in CB for compensating the rate inequality is thus small.
Typically CB and PB work together, because DSP has a separate thread and delays on capture (DSP waiting for new samples to process) will delay delivery of DSP'd samples to playback too.
Due to the chunked processing these computer-based chains are quite difficult to tune right and to run reliably at very small latencies. Therefore HW DSP with dedicated HW which processes samples continuously (not in chunks) is much more robust and capable of smaller latencies.