In audio, the current state of the art (as it always has been) is the linear system. Two channels is accepted by convention, or deliberate design, as sufficient. Is there a better method, that listening tests could reveal? I don't think so. As such, listening tests are probably just confirmation of what we already know - and much less sensitive than measuring instruments - and possibly less sensitive than a relaxed person listening for enjoyment (controversial! but we cannot know the answer).
Yes, 2 channel stereo was adopted as a conventional standard in the 1950's, and it remains the widely accepted paradigm. I know of no formal, published scientific listening tests that reveal that multichannel music reproduction from discretely recorded multichannel sources is preferred over stereo by test subjects. As you say, it would just be confirmation of what many already know to be obvious. Therefore, such scientific tests are not published. No one has an interest in rigorously attempting to prove the obvious. Such tests would be superfluous, laughably so.
At the advent of color TV, I also do not think that there were scientific studies to "prove" conclusively that people statistically preferred color to black/white. Not saying multichannel audio sound is quite as obvious as that, but I think you catch my drift. Maybe such formal listener preference studies of stereo vs. mono were published, but I doubt it for the same reasons.
Where rigorous scientific listener preference studies of sound reproduction are useful is in identifying many smaller, less obvious differences that were not previously "known" or which were controversial. And, indeed, many of the issues we now routinely deem as "known" might not be known were it not for such earlier studies on human test subjects.
We often forget or are oblivious to the painstaking research and testing that underlies what we now deem as self evident. If you take the long view, some things in audio were not so obvious at one time, and many were not routinely accepted until adequate scientific testing on human subjects was done.
And, there are many mysteries in the realm of psychoacoustics that measurement instruments cannot measure, but which can have significant impact, including on audio system design. There still are many mysteries to be solved, I believe. And, there have even been some accepted conventions proven by newer research to be inferior in terms of listener preference.
As Toole says, two ears and a brain are not the same as an omni mike. Again, I highly recommend his latest book, Sound Reproduction - 3rd Edition for many useful insights into the research on acoustics and psychoacoustics, his own and others.