The paper really doesn't explain anything of the sort. It needs to be read carefully. Foremost it's a conceptual piece and manifesto-ish, and provides no direct experimental evidence. The way it presents long and short listening session is also not intuitive, nor does it say that all these sessions let one hear all kinds of heard differences equally.
View attachment 103974
I highlight this passage: "
slow listening could take as long as it would take for the subject to learn a new language, maybe more." I will come back to this claim later. For now, let's add some substance to it. Earlier on, there's a sort of industry criticism:
View attachment 103975
The claim around "anyone interested in sound today" is weak, or at least poorly written. I could generally agree, allowing a lot of slack, since the cognitive aspects of hearing are well-known, as is the increasing standard for "life-like sound" when looking at historical commentary on various recording and reproduction technologies (a little technological leap gets people saying that the music is there with them in the room or the recording has come to life, and later on, after the enthusiasm fades, deficiencies are better appreciated). That's not the same thing as claiming that psychoacoustic encoding is ineffective, or that older codecs are worse than newer ones. This is where the paper should have quoted or looked for research on this issue specifically. It's too bad it didn't.
Regardless, let's take the mention of the 400ms grey zone mention at the end of that passage and find a more detailed description:
View attachment 103976
"Phoneme discrimination" comes from linguistics. A phoneme is a unit of uttered language, kind of like a syllable, but related to speaking and hearing rather than written or grammatical language. The difficulty described is like learning to recognize the tonal components of languages like Vietnamese, or to hear the differences in pronunciation of various vowels across regions and accents. One of the standard fields in linguistics is the physiology of language and which parts of the mouth, tongue, throat and nose are used during speech (this research was used to establish minimum standards for intelligibility in communications systems in the 1940s and earlier by Fletcher, e.g., the frequency content of speech). This then used to represent acoustic differences using notation like
IPA and technical vocabulary.
Cognitive linguistics steps a bit further back and notes that
acoustic differences of vocalization are often not enough to help listeners recognize and pick out phonemes (from the linked paper: "
Listeners, who are misinformed about a speaker’s (socio-)linguistic background, are more inclined to perceive the incoming stimulus according to their sociolinguistic expectations than to the acoustic characteristics of the stimulus."). Listeners often need context, like what to listen for, or where the speaker is coming from. It's like trying to understand the accent, grammar and vocabulary of Jamaican or Scottish English if you hail from elsewhere. It's not easy, but after a while in the country it becomes second nature. Same goes for sound and music.
Let's come back to this statement: "
Slow listening could take as long as it would take for the subject to learn a new language, maybe more." So, all in all, the paper's emphasis is on learning, the idea that certain perceptions may not accessible to a listener immediately, but may be easily recognizable later. That's straight out of psychology (note that psychoacoustics is considered a branch of psychology, not its own field), and well-designed experiments record not only subject responses but subject responses over long periods. (My favorite
study on memory, by Luria, took place over 30 years!) Note that the timescale is not defined beyond making this general claim.
As such the paper really doesn't focus on long listening sessions per se, but on what it has taken, historically speaking, to recognize what are now known as commonplace problems in audio. There is really no basis for concluding that the long term review and impressions-type publications are in the right, or have any validity beyond the accidental or circumstantial. That this paper is used to defend those kinds of uncontrolled listening comparisons is simple misreading.
This is the paper's conclusion:
View attachment 103992
View attachment 103993
The final sentence is the key. It says: don't take shortcuts in recording, reproduction or manufacturing technologies based on a simple idea of psychoacoustics, like accepting lenient standards for lossy compression or distortion or loudspeaker design. The implication that there are potential issues and differences between gear that we are not completely aware of is a pretty fair conclusion. But note that it does not support or anywhere say that those who are claiming to hear differences are in the right. All it says is not to take the easy way out when it is possible to do better, especially if the current research does not have all the answers about what is acceptable or what isn't. Note this, for example:
View attachment 104004
Seems to be pretty clear cut. Using new knowledge, rigorously test existing industry standards and see if they hold up to the science. If you have to use short listening tests, make sure that:
View attachment 104007
Which means that, as a manufacturer or researcher, you can't conclude that your listeners' reports are reliable until you take their frame of reference and abilities into account, and how you might bias the results by having too narrow a focus when designing the experiment.
The paper is mostly a demand for better mindfulness and higher quality research from an industry that tends to value extremely specialized knowledge and an economic small-minded sort of practicality. An engineer is more likely to be able to quote you Newton's laws of motion than to have read any of his writings (
@andreasmaaan this is what I meant before when I said that
textbooks don't help knowledge—most rip out the idea, formula, fact from the context of its invention, and present it as is, without any acknowledgement of what it took to come upon it—sorry I didn't reply before).