I think there’s been some repeated misunderstandings / misrepresentations of each side of the debate by the other on here. These are the two opposing hypotheses/claims under debate:
- Experienced/trained listeners are no less susceptible to sighted bias than average
- Experienced/trained listeners are less susceptible to sighted bias than average
Drs Toole and Olive’s study has been cited in support of claim 1. Below are the results from the paper that can be used to compare experienced listeners' preference ratings to the average for experienced and inexperienced listeners as a whole, for blind and sighted tests. Note: the speaker ratings are likely naturally compressed due to listeners' contraction bias (not using extreme ends of the scale), which is common in subjective evaluations. Rescaling the rating axis from 4/5 to 8/9 is simply done to make the data more readable, and visually correct for this contraction bias, so there's no conspiracy there.
Average for experienced and inexperienced listeners (same data as the first graph in Sean Olive’s blog often reproduced on here, but in a different format):
Experienced listeners (the more pertinent graph to this discussion, which I don't think has been discussed yet):
So for experienced and inexperienced listeners as a whole shown in the first graph, on average only the preference order of speakers S and T changed places between sighted and blind listening, but for experienced listeners only, the preference order completely changed - when listened sighted it was: D, G, T, S, whereas the preference order during blind listening was: S, D, T, G. The difference in score between the blind and sighted ratings given for the same speaker by the experienced listeners is also larger on average than this difference for all listeners. This suggests the experienced listeners were at least equally (if not more) affected by sighted bias than the average of all types of listeners.
The study also compared how sensitive the listeners were to changing acoustic variables in sighted and blind listening, in this case two speaker positions, 1 and 2.
Average for experienced and inexperienced listeners:
Experienced listeners:
Both graphs show speaker location had a strong influence on preference when blind, yet little effect when sighted, again showing that experienced listeners are just as affected by sighted bias as all listeners are, which in this case deafens ('blinds'
) them to actual acoustic changes caused by speaker positioning, which they recognised fine during actual blind listening. All these results support claim 1 above.
Now, from what I can tell, the two main objections to the study seem to be:
(a) The listeners’ bias is not representative of and much greater than Amir’s possible bias, due to them being Harman employees and three of the four speakers being Harman brands
(b) The study's definition of experienced/trained listener is too inclusive
Starting with objection (a), I think
@preload made some great points here. Simply investing a large amount of money in, owning and very much liking a brand’s products and design philosophy can in itself foster a subconscious brand loyalty and so cognitive bias. Sure this would likely not be as much as the bias the Harman employees had for their own speakers, but there are all the other possible biases
@Sean Olive mentioned that are still on the table and common possibilities to all sighted listening tests. Even if objection (a) is valid, and an extreme position is maintained that the only valid results are for the non-Harman speaker ‘T’, the last graph above showing experienced listener ratings does show a significant change in rating given for this speaker in ‘position 2’ between sighted and blind, shifting it from being ranked third sighted, to last blind, which notably runs counter to any possible bias
against speaker T due to it being a rival brand, suggesting the remaining biases, that are common to all sighted listening tests, play a relatively large role. The graph above for all listeners shows the same shift in ranking of speaker T in position 2 from sighted to blind, and a similar change in rating (again less than for experienced listeners), echoing the results from the first two graphs of this post, again showing the experienced listeners were at least equally if not more affected by sighted bias, even when listening to a speaker they had no vested interest in.
So what about objection (b)? Here's how experienced/inexperienced listeners are defined in the study (my emphasis):
In these tests, listeners were considered to be inexperienced if they had no previous experience in controlled listening tests. Other definitions are possible, which might include persons with no critical listening experience whatsoever.
The bolded parts imply an experienced listener is one who has had
at least critical listening experience
and controlled listening test experience. This doesn't sound too inclusive to me. And even if it is, and doesn't meet the requirements of a 'highly experienced/trained' listener (whatever they are), it makes sense that this experience is a continuum of ability, which would mean at worst the study is suggestive evidence that even highly experienced listeners are also no less susceptible to sighted bias than others (claim 1). What scientific research is there in evidence of the opposing claim 2 at the beginning of this post, that experienced listeners are less susceptible to sighted bias? If there is none, then claim 1 is on stronger ground. If you take the extreme (and I'd say irrational) view that this study contains
zero evidence for claim 1, then the two claims are on equal footing, and you should remain agnostic. The fact remains that claim 2 is a claim of exception however, that goes against not only this study, but cognitive science as well - I'm not aware of any scientific studies showing sighted biases can be noticeably reduced through knowledge of them and training. In fact, this would be a prime example of the (ridiculously named, but very real) G.I. Joe fallacy. When it comes to cognitive biases, knowing really isn't half the battle - in fact it's not even close:
It should be noted that as Sean said
here, Harman now have a more exacting definition of a trained listener – passing level 8 or higher in their How to Listen software, with normal audiometric hearing, and showing good discrimination and consistency in their sound ratings. I believe Amir has said he reached level 5/6 (still much better than audio dealers who only passed level 3), and I presume ‘normal’ hearing precludes people with notable presbycusis that can start to become significant in terms of sound judgement variability after around age 50 (as Floyd Toole has humbly described with reference to his own hearing and I mentioned in
this post). 'Normal hearing' would obviously also preclude those with notable NIHL which could occur due to such activities as, ahem, routinely listening to headphones at ‘earlobe resonating' volumes
. Of course, Amir has specific training in identifying small lossy digital compression artefacts (I believe primarily via IEMs/headphones, speakers being notoriously harder to hear sound imperfections with), but the relevance of this specific skill to discerning differences in speakers’ acoustic attributes at normal listening volumes and distances, and to what extent if any this skill could balance out the high stipulations for a Harman trained listener above is debateable.
But the bigger picture here is that sighted bias is just the tip of the iceberg in terms of the nuisance variables needed to be controlled for listening tests to be useful in drawing reliable conclusions. Some of these have been controlled for here, but there are major exceptions in addition to standard sighted bias: measurement bias (from seeing the spinorama before listening), no level-matching, and no instantaneous A/B switching (instead mostly comparing speakers over days, weeks and months, relying on long-term auditory memory which is notoriously unreliable). And this isn’t even considering the fact that this is a single listener whose perceptions are not as generalizable as a collection of listeners, or any of the other methodological controls put in place in a scientifically controlled double-blind study Sean mentioned
here. The gulf between those studies and the listening tests here really is huge.
Please note: this post is in no way either an attack on Amir, or a demand (or even a request) to change his listening methodology (this would obviously be impractical for one person and especially during a pandemic, and he's doing all of this for free so I would never demand anything). I don’t think anyone else is taking these positions either, and of course we are all incredibly grateful for the frankly mind-boggling amount of work he’s put in to this project. However, it has been claimed that the subjective impressions are ‘data’, from which conclusions can be drawn about the accuracy and validity of Sean Olive’s speaker preference rating formula. If this is the case, this necessitates the same analysis and scrutiny of the ‘measuring instrument’ and method of data collection as has been exacted on the Klippel NFS data. If this is objected to or ignored, then it's simply inconsistent and unscientific to maintain the subjective judgements are data, and not informal impressions (which is what they seemed to start out as, and personally I was fine with). I am also not saying the impressions have zero utility either - they can definitely point in interesting directions for fully controlled listening tests to investigate further. But any claims made by anyone that conclusions can be drawn about the validity of the preference formula from these impressions are not really tenable, as partially-controlled, sighted, single-listener tests are simply incongruent with the well-controlled, double-blind tests by hundreds of listeners the formula is based on.