@Bluemootwo I've now gone through that meta-study in more detail and read each of the studies to which it refers in which a statistically significant result among trained listeners was obtained, which are summarised in table form in the meta-study:
Here are my study-by-study comments:
Jackson 2014 and 2016
I couldn't find the 2016 study anywhere. It may not have published due to a failure to pass peer-review. The
2014 study did find that subjects were able to distinguish between high-res content and content that had been downsampled to 44.1kHz/48kHz and
truncated or quantised using rectangular dither to 16 bit. Unfortunately, the study conditions are unrealistic, because it is not standard practice among mastering engineers to truncate or quantise with rectangular dither when mixing down to 16/44 (as this results in unnecessarily high levels of noise and/or distortion when compared with the industry-standard practice of using noise-shaped dither).
Kanetada 2013A and 2013B
Couldn't find the papers. They were conference papers presented
here, and as such were not published or peer-reviewed, to my knowledge.
Mizumachi 2015
Available
here. The authors took a high-resolution recording and truncated/downsampled it to 16/48 in Matlab. As per Jackson 2014, since proper dithering was not applied when the audio was downmixed, conclusions regarding audibility of artifacts in the "quasi-CD quality" versions used in the study cannot be extrapolated to normal CD/redbook audio.
Yoshikawa 1995
Available
here. The authors tested subjects' ability to differentiate between pulse trains that had been low-pass filtered at 20kHz vs pulse trains that had been low-pass filtered at 40kHz. No statistically significant ability to differentiate was found for all test conditions other than one, namely where the pulse train was of 0.125s duration. This finding is summarised in this box plot:
I can't fault this experiment. It seems that, with pulse trains of 0.125s duration, subjects were able to discern to a statistically significant extent between signals that were low-pass filtered at 20kHz vs signals that were low-pass filtered at 40kHz, albeit only just crossing the threshold of statistical significance .
Theiss 1997
Available
here. A number of procedures/stimuli were used to test the hypothesis that subjects would be able to more accurately localise sound signals bandlimited to 48kHz (corresponding to a sampling rate of 96kHz) vs sound signals bandlimited to 24kHz (corresponding to a sampling rate of 48kHz). A number of stimuli were used:
- Impulse trains - results were not statistically significant.
- White noise - results were not statistically significant.
- Music - results were not statistically significant.
And the authors conclude: "The hypothesis that an increase in sampling rate will result in an increase in spatial resolution can be rejected from the experimental data given above."
An additional experiment was then carried out with three subjects to test the hypothesis that a music signal (a Brahms piano recording) sounded different at 96kHz/24bit than at 48kHz/16bit (downmixed using traingular non-noise shaped dither). For two subjects, results were not statistically significant. One subject, however, was able to discern correctly in 16 out of 17 trials (P=0.014%). Further investigations were attempted to separate out whether the high success rate was attributable to the lower sample rate or the lower bitrate, but the equipment apparently failed so the results were inconclusive.
Again, this finding cannot IMHO be extrapolated to standard mastering practices IMHO, because non-noise-shaped triangular dither was used, which results in a perceptually higher noise floor than noise-shaped dither used in standard mastering practice.
My 2c
Firstly, the 60% figure is clearly grossly misleading. 60% of trained listeners were able to discern differences
in the subset of tests in which trained listeners were able to discern any difference at all. There were many studies/tests in which trained listeners were not able to discern any difference, and these are strangely
not accounted for in the meta-study authors' 60% figure.
Secondly, in those studies in which statistically significant results were obtained, confounding choices to use simple truncation or non-noise-shaped dither were made, rendering it difficult to extrapolate from these to real-life cases in which redbook audio is dithered with noise-shaping. In the only study in which this issue was not present, the stimulus was not music, but rather 0.125s pulse trains, and even in that case, the result was only just statistically significant.
Of course, there is no substitute for reading these studies yourself instead of simply relying on my analysis/interpretation.
My takeaway would be that these studies demonstrate that redbook audio is skating
very close to thresholds at which noise or other artefacts may become (marginally) audible in specific listening conditions, although they do not quite establish that properly-processed redbook crosses these thresholds with music as the program. Having said that, I do think 24bit/48kHz would be a more prudent standard, as it would provide a greater buffer and would, in particular, insure against substandard recording/mixing/mastering practices.
However, assuming industry standard practices are followed, I see no direct evidence that 16/44.1 doesn't (
just) ensure no audible degradation.
Of course, for recording/mixing/mastering, higher sampling/bitrates should
always be used, as these provide digital "headroom" for processing at the various stages before final mixdown.