Limitations of blind testing procedures

oivavoi · Apr 18, 2017

Jakob1863 said:
Maybe the ABX protocol does not suit your personal abilities; as there is no need to use the ABX you could switch over to another protocal, maybe A/B paired comparisons.
Training under the specific test conditions is usually a very good idea, because participating in a controlled listening test is very different from "normal" listening.

Researcher already found out roughly 60 years ago that test results differ due to the protocol (comparing ABX to A/B) and related the divergence to the different internal mental processes involved.

That´s why training and usage of positive controls is so important.

Thanks. Yap, I find A/B tests much easier to do.

But I for my personal use I actually don't see much point in doing blind testing. I'm a measurements guy: I buy the gear which objectively confirms most closely to the objective ideal of high fidelity. For me, at the moment, that implies phase and time coherent loudspeakers with a relatively flat frequency response and low distortion (which necessitates active crossovers), and a good polar/power response so that the reverberant field in the room becomes tonally correct. With electronics I go for affordable and well-designed no-nonsense products with low distortion, all using balanced connections. So far, it's just about achieving high fidelity in an objective sense. The only place where subjectivity comes into play for me is concerning speaker directivity, and equalizing the system at the end according to what subjectively sounds good to me.

The only place where I can see myself using blind testing in the future, is if I should come across something that sounds "strange" or unnatural, even though it measures well. Then I might listen blind to see what it's really about.

Jinjuku · Apr 18, 2017

Jakob1863 said:
Now we are back at the beginning. You´ve send a set of cables to a listener and got his correct response.

I've gotten no response because no one will participate

Jakob1863 said:
The McGurk effect isn´t an appropriate example in our context and not all humans experience it and a newer study draw the conclusion that it depends on training too.

You are free to substitute opinion with, you know, actual data.

SoundAndMotion · Apr 18, 2017

By the way, before you try to send me cables, or deride me as a "believer", I should tell you I don't believe in cable burn-in, but I do believe in good scientific methods.

Jinjuku said:
I've gotten no response because no one will participate

And I doubt anyone ever will. Why should they? I doubt it is one of their goals to jump through your hoops.

Note: there are flaws in you plan. I’m working from your description in post #180 in this thread. Is that complete? Before describing the flaws, let me know where the most complete description is.

Jinjuku said:
You are free to substitute opinion with, you know, actual data.

My reason/opinion on why it is an inappropriate example: the McGurk effect is an effect of multisensory integration. The brain wants info about different sensory modalities to agree with each other about what is happening around us. When there is a mismatch, the brain uses tricks to try to create a uniform percept, or it regards the mismatch as occurring from different processes. When our visual system (eyes) tell us a sound source “ought to” sound like “ga” (from mouth shape and motion, etc.), but the ears hear “ba”, the brain does it best to combine and you perceive “da”.

“Blind listening” isn't about the eyes. It’s about knowledge. If you wear a blindfold and I tell you “this is cable A” and ask you which cable it is, it is not a blind test. It does not involve multisensory integration.

You know, actual data on what Jakob mentioned:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958963/
http://journal.frontiersin.org/article/10.3389/fpsyg.2014.00407/full
http://link.springer.com/article/10.3758/s13423-015-0817-4
https://hal.archives-ouvertes.fr/hal-00941306/

Jinjuku · Apr 18, 2017

SoundAndMotion said:
By the way, before you try to send me cables, or deride me as a "believer", I should tell you I don't believe in cable burn-in, but I do believe in good scientific methods.

I'm not going to send cabling to anyone that doesn't want them. Although if I was secure in my beliefs and offered some cash I wouldn't hesitate to make an easy buck.

SoundAndMotion said:
And I doubt anyone ever will. Why should they? I doubt it is one of their goals to jump through your hoops.

I have offered monetary compensation. Never asked anyone to jump through hoops without some carrot.

SoundAndMotion said:
Note: there are flaws in you plan. I’m working from your description in post #180 in this thread. Is that complete? Before describing the flaws, let me know where the most complete description is.

Most methods have flaws and I'm intellectually honest enough to admit that and welcome constructive critique. Post 180 stands on it's own merits. I would 'burn in' the cabling with music.

SoundAndMotion said:
My reason/opinion on why it is an inappropriate example: the McGurk effect is an effect of multisensory integration.

I didn't say it's a perfect example. It's a quick example that does deliver the point that you can't always trust the ear/eye/brain interaction. It is appropriate in the context that people are using multisensory integration to evaluate an audio only application.

While a cursory read of your papers does show that people can be trained (albeit not to 100%) against the effect, closing your eyes (blinding) certainly pulls into focus what is actually heard with out all the, as you called it, 'hoops' to jump through.

hvbias · Apr 18, 2017

oivavoi said:
Thanks. Yap, I find A/B tests much easier to do.

But I for my personal use I actually don't see much point in doing blind testing. I'm a measurements guy: I buy the gear which objectively confirms most closely to the objective ideal of high fidelity. For me, at the moment, that implies phase and time coherent loudspeakers with a relatively flat frequency response and low distortion (which necessitates active crossovers), and a good polar/power response so that the reverberant field in the room becomes tonally correct. With electronics I go for affordable and well-designed no-nonsense products with low distortion, all using balanced connections. So far, it's just about achieving high fidelity in an objective sense. The only place where subjectivity comes into play for me is concerning speaker directivity, and equalizing the system at the end according to what subjectively sounds good to me.

The only place where I can see myself using blind testing in the future, is if I should come across something that sounds "strange" or unnatural, even though it measures well. Then I might listen blind to see what it's really about.

Very well said and I agree with everything in your post.

I will add that I find blind listening tests helpful where objective design goals are tested on a statistically significant group of people, with hopefully some trained listeners present. Something like Harmon or Philips Golden Ear Training is worth going through rigorously. For instance one area that personally helped me out in narrowing down what to look for in speakers is Toole's research/listening tests on controlled directivity. Using purely the measurement method an objectively well designed speaker with flat on axis response and ignoring polars would lead (in my opinion/for my criteria) to a suboptimal purchasing decision. That is the best example of an additional objective design criteria that comes to my mind, I am sure there are others floating around upstairs

I see you did mention directivity in your post, but to many speaker designers, flat on axis is still considered "objectively good enough".

SoundAndMotion · Apr 18, 2017

Jinjuku said:
I'm not going to send cabling to anyone that doesn't want them.

LOL. :-D
I realize you wouldn’t just send them out… I wanted to nip in the bud anyone telling me to just take you up on your offer… thinking I “hear” burn-in.

Jinjuku said:
I have offered monetary compensation. Never asked anyone to jump through hoops without some carrot.

Oh sorry, I didn’t realize this. How much? (Curious, not wanting to do it.)

Jinjuku said:
Most methods have flaws and I'm intellectually honest enough to admit that and welcome constructive critique. Post 180 stands on it's own merits. I would 'burn in' the cabling with music.

It’s good you know that flawless or perfect are not usually possible. But that belies your confidence when you state (multiple times) that no one has taken you up on your offer and you state (multiple times) that no one has poked (or polked, as you joke) holes in your method.

I notice 3 holes that have varying degrees of difficulty to repair. First, although you just mentioned using music, the rest of your burn-in protocol is unspecified. You also don’t mention how you’ll confirm that cables are “virgin” and weren’t out for a 30 day test with another customer. This is easy to fix. You state that you will communicate about and agree on theses issues with the test subject beforehand. Piece of cake-

Second, trust. You are openly hostile to believers, and I wouldn’t doubt that many wouldn’t trust you to burn-in correctly, certify virginity correctly, and even really send 2 types (rather than trick them and say ha-ha afterward). There may be other, easier solutions, but I think you’d have to pair up with someone the believers trust, work together and watch each other.

Third, and this has been mentioned before, how will you analyze? Are you really expecting to send out one burned-in set and one virgin set and have the whole thing ride on one answer? You were asked what you’d do if the answer was right and you said “it is what it is”. Really?!?

You realize that even if you can easily hear the difference between say 2 …. speakers, for example, and I put them in acoustically transparent enclosures, and asked you to identify them, you may well get a few wrong answers (mistake, focus, attention, distraction… etc.). So the whole thing riding on one answer is way too risky for me. If, instead, you offered to send 20 burned-in/virgin sets, that could work. But would you really do that? And ensure the agreed upon burn-in and virginity protocol? Well that could work… it would take a l—-o-—n—-g time, but it’d work. Are you willing? If not this, or something similar, you seem to lack sincerity.

Jinjuku said:
I didn't say it's a perfect example. It's a quick example that does deliver the point that you can't always trust the ear/eye/brain interaction. It is appropriate in the context that people are using multisensory integration to evaluate an audio only application.

No, that is the point. “Sighted” testing is not multisensory integration. It is a cognitive bias. That is why Jakob said it was not the appropriate example and I agree.

Jinjuku said:
While a cursory read of your papers does show that people can be trained (albeit not to 100%) against the effect, closing your eyes (blinding) certainly pulls into focus what is actually heard with out all the, as you called it, 'hoops' to jump through.

Not quite, but close. Closing your eyes removes the conflict. It doesn’t necessarily change focus, so you can pull in.

Jinjuku · Apr 18, 2017

SoundAndMotion said:
LOL. :-D

Oh sorry, I didn’t realize this. How much? (Curious, not wanting to do it.)

My initial offer has always been $100 to charity of the claimants choosing.

SoundAndMotion said:
It’s good you know that flawless or perfect are not usually possible. But that belies your confidence when you state (multiple times) that no one has taken you up on your offer and you state (multiple times) that no one has poked (or polked, as you joke) holes in your method.

No one to date taking up the offer is simply a hard data point even after back and forth about weaknesses and answering questions (such as offering a web cam on the burn in apparatus that could be checked on in real-time during the burn in process).

SoundAndMotion said:
I notice 3 holes that have varying degrees of difficulty to repair. First, although you just mentioned using music, the rest of your burn-in protocol is unspecified. You also don’t mention how you’ll confirm that cables are “virgin” and weren’t out for a 30 day test with another customer. This is easy to fix. You state that you will communicate about and agree on theses issues with the test subject beforehand. Piece of cake-

Correct. Input from the claimant is always welcomed as it has to be since I am testing claims. If the claim it's 10 hours of burn in using FR sweep then so it is. If its 100 hours using music playback @ such and such RMS then so it is.

SoundAndMotion said:
Second, trust. You are openly hostile to believers, and I wouldn’t doubt that many wouldn’t trust you to burn-in correctly, certify virginity correctly, and even really send 2 types (rather than trick them and say ha-ha afterward). There may be other, easier solutions, but I think you’d have to pair up with someone the believers trust, work together and watch each other.

I don't think you are being fair W.R.T to 'hostile'. I'm openly critical of people that don't apply critical thinking to what they are actually saying, that will not consider the view point that burned in cabling is meaningless in the context of use = burn-in in the course of listening.

SoundAndMotion said:
Third, and this has been mentioned before, how will you analyze? Are you really expecting to send out one burned-in set and one virgin set and have the whole thing ride on one answer? You were asked what you’d do if the answer was right and you said “it is what it is”. Really?!?

Yes really. It's data I'm after. Remember it could be 1 out of 1 answers or 1 out of 50.

SoundAndMotion said:
You realize that even if you can easily hear the difference between say 2 …. speakers, for example, and I put them in acoustically transparent enclosures, and asked you to identify them, you may well get a few wrong answers (mistake, focus, attention, distraction… etc.).

You could be entirely correct. Can I pick the speakers and can I make the claim? Do I get control of the volume knob? The material ? Then you could suggest your above method. I'm not asking for anything I haven't offered.

SoundAndMotion said:
So the whole thing riding on one answer is way too risky for me. If, instead, you offered to send 20 burned-in/virgin sets, that could work. But would you really do that? And ensure the agreed upon burn-in and virginity protocol? Well that could work… it would take a l—-o-—n—-g time, but it’d work. Are you willing? If not this, or something similar, you seem to lack sincerity.

I totally agree with your above assessment and have never been contrary. We are thinking along like lines.

SoundAndMotion said:
Not quite, but close. Closing your eyes removes the conflict. It doesn’t necessarily change focus, so you can pull in.

It removes a variable.

oivavoi · Apr 18, 2017

hvbias said:
Very well said and I agree with everything in your post.

I will add that I find blind listening tests helpful where objective design goals are tested on a statistically significant group of people, with hopefully some trained listeners present. Something like Harmon or Philips Golden Ear Training is worth going through rigorously. For instance one area that personally helped me out in narrowing down what to look for in speakers is Toole's research/listening tests on controlled directivity. Using purely the measurement method an objectively well designed speaker with flat on axis response and ignoring polars would lead (in my opinion/for my criteria) to a suboptimal purchasing decision. That is the best example of an additional objective design criteria that comes to my mind, I am sure there are others floating around upstairs I see you did mention directivity in your post, but to many speaker designers, flat on axis is still considered "objectively good enough".

Interesting response. I would very much have liked to undergo some golden ear training... I have no illusion of having golden ears as of now!

Concerning measurements and polar response: I would say that a loudspeaker that measures well on-axis but badly off-axis doesn't measure well - objectively. Do we need blind tests for getting to that conclusion? After all, much of the sound that reaches our ears in a typical room is reverberant sound. If this sound is very different from the direct sound, then the end result is objectively worse than in a case where the reverberant sound has a similar tonal quality to the direct sound. To me, this just seems logical

amirm · Apr 18, 2017

Jakob1863 said:
Amirm, i beg to differ. If a listener claims to hear a difference under the usual longer switching time, and someone wants to know if there is some evidence for the claim, then evaluating by using shorter switching times would not help.

Help in what regard? If it is to find out if there really is a difference, then we should strive to use research (and my own personal experience) that near instant switchovers are infinitely more reliable in finding such differences than any long term listening. I have passed many critical listening tests this way and would have no prayer of doing so with longer term listening.

If our goal is to teach the person a lesson, then sure, we let them violate the above as much as they want. By doing so, they help our cause of embarrassing them. It doesn't help us figure out if there is any truth to their observations.

oivavoi · Apr 18, 2017

By the way, Amir: What is your take on modern well-designed dacs? Have you been able to tell differences between them?

hvbias · Apr 19, 2017

oivavoi said:
Interesting response. I would very much have liked to undergo some golden ear training... I have no illusion of having golden ears as of now!

Concerning measurements and polar response: I would say that a loudspeaker that measures well on-axis but badly off-axis doesn't measure well - objectively. Do we need blind tests for getting to that conclusion? After all, much of the sound that reaches our ears in a typical room is reverberant sound. If this sound is very different from the direct sound, then the end result is objectively worse than in a case where the reverberant sound has a similar tonal quality to the direct sound. To me, this just seems logical

I have to assume Phillips were being facetious when they named it that

WRT to the off axis measurements, were any manufacturers designing speakers where this was a priority prior to Toole/Harman's blind listening tests? I could swear they started popping up after that, though this could be some sort of bias on my part of only noticing them (or more of them) after reading his research. I've corresponded with one known British monitor company that think flat on axis and low distortion are the most important aspects and I could read between the lines that they placed little importance on off axis.

oivavoi · Apr 19, 2017

hvbias said:
I have to assume Phillips were being facetious when they named it that

WRT to the off axis measurements, were any manufacturers designing speakers where this was a priority prior to Toole/Harman's blind listening tests? I could swear they started popping up after that, though this could be some sort of bias on my part of only noticing them (or more of them) after reading his research. I've corresponded with one known British monitor company that think flat on axis and low distortion are the most important aspects and I could read between the lines that they placed little importance on off axis.

That wouldnt happen to be AVI, would it?

Sounds like them. Those are the monitors I happen to have at the moment (DM10). In the near-field, they are the best and most natural sounding speakers I've ever heard. Honestly. Their claim is that waveguides etc color the sound in a very slight way, and I think they might have a point. If so, you can have a trade-off between off-axis behavior and distortion.

But in the far-field, I've heard speakers which behave better than the DM10s. And there's no doubt that a speaker with the DM10 sound and better off-axis behavior would be an even better speaker! And that's the kind of speaker I'm hoping to find.

Blumlein 88 · Apr 19, 2017

hvbias said:
I have to assume Phillips were being facetious when they named it that

I don't know. I tried out the version that used to be online. Got to silver level pretty easily. Gold took a bit of doing. I think some of the Harman training is still online.

Now I hardly think I am ready to be a tonmeister.

Cosmik · Apr 19, 2017

hvbias said:
WRT to the off axis measurements, were any manufacturers designing speakers where this was a priority prior to Toole/Harman's blind listening tests? I could swear they started popping up after that, though this could be some sort of bias on my part of only noticing them (or more of them) after reading his research. I've corresponded with one known British monitor company that think flat on axis and low distortion are the most important aspects and I could read between the lines that they placed little importance on off axis.

Just found this from the 1970s. KEF certainly seem to have been thinking about the importance of off-axis sound with the 105.

oivavoi · Apr 19, 2017

Very interesting. Thanks. Reading this manual, I'm again amazed how much hifi companies got right back in the old days, and yet somehow these things got lost in the 80s and 90s... With this design, KEF got so many things just right, IMO. The focus on even dispersion with frequency (but without any horn/waveguide which may color the sound), the insights into the importance of both the direct sound and the reverberant soundfield, separate enclosures for the the different drivers (reduces distortion and resonances), the pyramide shape which has several advantages, the focus on adequate amplifier power to reduce clipping (since dynamic peaks requires much more power than commonly assumed), etc.

I actually saw a Model 105 on the second hand market in Norway recently for about 1000 euro, and was very tempted to buy it and see if I could activate it using a cheap minidsp unit. I suspect that the crossover network is quite complex in this one, though.

Jakob1863 · Apr 19, 2017

oivavoi said:
Thanks. Yap, I find A/B tests much easier to do.

But I for my personal use I actually don't see much point in doing blind testing. I'm a measurements guy: I buy the gear which objectively confirms most closely to the objective ideal of high fidelity. For me, at the moment, that implies phase and time coherent loudspeakers with a relatively flat frequency response and low distortion (which necessitates active crossovers), and a good polar/power response so that the reverberant field in the room becomes tonally correct. With electronics I go for affordable and well-designed no-nonsense products with low distortion, all using balanced connections. So far, it's just about achieving high fidelity in an objective sense. The only place where subjectivity comes into play for me is concerning speaker directivity, and equalizing the system at the end according to what subjectively sounds good to me.

The only place where I can see myself using blind testing in the future, is if I should come across something that sounds "strange" or unnatural, even though it measures well. Then I might listen blind to see what it's really about.

Nothing wrong with that.

Controlled listening tests for personal use can be fruitfull in helping to get further insight into your own perception. And work as an additional guard against fooling yourself, but it is obviously quite as easy to get incorrect results via "DBTs" as it is with "sighted listening" .

There is so much to learn about the quality of reproduction chains and no one of us knows right from the beginning which level of quality is achieveable with a certain record and what reproduction system will get the most of it (meating our personal preferences) under the usual constraints.
Some effects are easier to access within tests while others are more difficult (emotional impact for example) to grasp. We have to learn to do evaluational listening and to cover the most important points in quite short times, which also means to extrapolate from short impressions to long term effects that might occur.

PS not to forget that conclusions drawn are usally relying on estimates of the underlying population distribution parameters. Chances are quite high that your personal preferences differ from the mean so you have imo to listen for yourself......

Jakob1863 · Apr 19, 2017

amirm said:
Help in what regard?

Help wrt to find if their is evidence in support of the claim, which was based on listening under conditions including long(er) switchover time spans.in

If it is to find out if there really is a difference, then we should strive to use research (and my own personal experience) that near instant switchovers are infinitely more reliable in finding such differences than any long term listening. I have passed many critical listening tests this way and would have no prayer of doing so with longer term listening.

As said before, it depends on the question/hypothesis under examination. And as we know the more tight the control the less the practical relevance of any result. (there is an reciprocal relationsship between level of control and, so to speak, everyday relevance)

If our goal is to teach the person a lesson, then sure, we let them violate the above as much as they want. By doing so, they help our cause of embarrassing them. It doesn't help us figure out if there is any truth to their observations.

If we want to teach a lesson then we shouldn´t be imo interested in embarrassing anyone but in helping to get good results. It is my personal experience (and other reporting similar observations) that most likely listeners will not do very well in their first test(s) under controlled conditions, if the EUT depends on a multidimensional perceptual impression.

Although having read literally hundreds of papers on sensory/auditory memory (and ASA as well), i have yet to find a model approach that covers all the different aspects in a convincing manner.
So, it depends.....some are advocating to use short samples (<= 5s to avoid hopefully any influence of information in categorical storage) others promote samples of intermediate length (15-20s as in the ITU-R BS 1116-x, which is hard to argue under several model approaches mainly relying on a FIFO process) while others point out that sometimes longer samples are needed to ensure that listeners are able to access all dimensions.

amirm · Apr 19, 2017

oivavoi said:
By the way, Amir: What is your take on modern well-designed dacs? Have you been able to tell differences between them?

It has been years since I have done any listening tests of such. Putting aside the crappy ones, I don't believe there are audible differences between them based on measurements.

amirm · Apr 19, 2017

Jakob1863 said:
If we want to teach a lesson then we shouldn´t be imo interested in embarrassing anyone but in helping to get good results. It is my personal experience (and other reporting similar observations) that most likely listeners will not do very well in their first test(s) under controlled conditions, if the EUT depends on a multidimensional perceptual impression.

We cannot convince them to even have a discussion about proper way of doing the test (i.e. fast switching, level matching, etc.). So what we are left is asking them if they would do the test on their own terms.

As to the second one, I have passed many difficult tests under stringent tests. I don't have much patience left these days

but if pushed, I will put in the time to do it. And I can do better using such tools (e.g. ABX tests) than without, i.e. ad-hoc listening.

SoundAndMotion · Apr 20, 2017

amirm said:
We cannot convince them to even have a discussion about proper way of doing the test (i.e. fast switching, level matching, etc.). So what we are left is asking them if they would do the test on their own terms.

As to the second one, I have passed many difficult tests under stringent tests. I don't have much patience left these days but if pushed, I will put in the time to do it. And I can do better using such tools (e.g. ABX tests) than without, i.e. ad-hoc listening.

It has been said “To a man with a hammer, everything looks like a nail.”

Amir, you are a man with a hammer. You are good with your hammer, even impressive. But not everything is a nail. Even a thin, cylindrical device 1 inch long and 1/16” wide with a flat head and pointed tip might be a nail or a wood screw. And your hammer is not a great tool for the wood screw.

To know the right tool (measurement method, e.g. listening test), you must first know what is to be measured and why (for what purpose will the measurement be used).

Your hammer is not universally applicable.

Limitations of blind testing procedures

Major Contributor

Major Contributor

Active Member

Major Contributor

Addicted to Fun and Learning

Active Member

Major Contributor

Major Contributor

Founder/Admin

Major Contributor

Addicted to Fun and Learning

Major Contributor

Grand Contributor

Major Contributor

Major Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Founder/Admin

Founder/Admin

Active Member

Similar threads