@GXAlan,
very good effort, we certainly need more tests on this level of thoroughness and precision.
Alas, in my experience, one can never achieve the best possible resolution and robustness and most importantly, a reliable verification, of such difference tests with un-synced recording. That is, sample-synced record while playback (aka one integrated ADC+DAC device, both section running from one single clock) is required to really expose the fine grain of differences. As good as the Cosmos ADC is in term of standalone test, lacking any means of syncing to a source is a big drawback (and the reason I didn't buy one). For the same reason, a standalone source like a SACD is not the best possible option.
While DeltaWave is quite good at eliminating these dynamic differences from clock mismatch and drifts it can only do so much.
Notably the difference file is not very clean and thusly not directly usable for a verify test where we can "eliminate" the difference by adding (resp. subtracting) it as a pre-correction to the input signal for one (resp. the other) amp. Basically you emulated one amp with the other by pre-conditioning its input signal to forcibly match the output to that of the other amp.
IHMO and in my strong experience, the verify test is the most important test to make sure that the difference that was found is really responsible for the observed changes and it is absolutely mandatory to technically subtantiate the claims. It is also the only way to make such tests 100% repeatable.
It can even be extended to check for the influence of linear differences (frequency response changes, magintude and phase) and non-linear effects like distortion in isolation, seperated from each other. Uncorrected linear differences are often the dominant differences but in the end they are trivial as the can be inverted. Non-linear effects like the compression you seem to have found cannot be inverted. In many tests I've made, the linear differences were dominating the results even though they would seem to be irrelevant at first glance whereas striking differences in distortion often were actually inaudible.
With sample-synced technique using the difference signal directly for pre-correction, amp A should measure and sound close to identical to B (and vice versa). Further, one can actually listen to the difference signal as it is not disturbed by processing artifacts which would give false clues. Finally, sample-synced process allows for easy time-domain averaging which is alway welcome to reduce noise/hum/buzz and any other components not strongly correlated to the signal.
EDIT: Link to the outcome of some in-depth difference tests I made:
Hi, Over the course of the last weeks I managed to set up and stabilize a procedure that allows to expose the error residual of Null-Tests á la DeltaWave. Actually I'm using DeltaWave for the final stage of display and analysis but it does only level matching fine-tining here, the rest is done...
www.audiosciencereview.com