• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Limitations of blind testing procedures

Status
Not open for further replies.
OP
oivavoi

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,939
Location
Oslo, Norway
That's why the protocol is open and public.

But, just to avoid any misunderstanding here: I'm not saying that your test is not valid in itself. The problem is rather with relying on these kind of tests for making absolute judgments about sound and audio equipment. Experimental psychological science (and psychoacoustics is a kind of psychology) has recently gone through a huge replication crisis. This means that a lot of the experiments that were done couldn't be replicated - meaning that the findings probably weren't real. Can we be sure that the situation is better in psychoacoustics?
 

Jinjuku

Major Contributor
Forum Donor
Joined
Feb 28, 2016
Messages
1,279
Likes
1,180
But, just to avoid any misunderstanding here: I'm not saying that your test is not valid in itself. The problem is rather with relying on these kind of tests for making absolute judgments about sound and audio equipment. Experimental psychological science (and psychoacoustics is a kind of psychology) has recently gone through a huge replication crisis. This means that a lot of the experiments that were done couldn't be replicated - meaning that the findings probably weren't real. Can we be sure that the situation is better in psychoacoustics?

Agreed. If others can't reliably reproduce the testing independently, if the testing isn't open for review, criticism, feed-back, and revision then that's a problem.

What I'm saying is that I have developed two bias controlled tests now that has pretty much been unassailable when I have laid out the protocol and apparatus. It's the best way I can think of. It doesn't mean someone can think of it in a better way. I'm happy to learn.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
But, just to avoid any misunderstanding here: I'm not saying that your test is not valid in itself. The problem is rather with relying on these kind of tests for making absolute judgments about sound and audio equipment. Experimental psychological science (and psychoacoustics is a kind of psychology) has recently gone through a huge replication crisis. This means that a lot of the experiments that were done couldn't be replicated - meaning that the findings probably weren't real. Can we be sure that the situation is better in psychoacoustics?
Even if they can be replicated following the original procedure, it doesn't mean that they were real:
Benveniste and his team of researchers followed the original study's procedure and produced results similar to those of the first published data. Maddox, however, noted that during the procedure the experimenters were aware of which test tubes originally contained the antibodies and which did not. Benveniste's team then started a second, blinded experimental series... The blinded experimental series showed no water memory effect.

And as I pointed out in another thread, mixing science with aesthetics gives meaningless results. Case in point: does MP3 encoding destroy the 'emotion' of musical sounds? There is evidence that people of a certain age (commonly found in universities) prefer the sound of MP3 over uncompressed! If you ran an experiment that showed that MP3 is 'better' than uncompressed, you would most likely think your methodology was faulty, but in fact it would just be human unpredictability and cultural fickleness rendering your experiment and its interpretation meaningless. Repeat it in ten years' time and you might get the opposite, equally reliable and statistically significant result. The difference would be that this time you would 'believe' your results and would publish them. People might even be able to replicate them until the next change in musical fashion.

The mistake is to believe that science has an answer for everything. Or its close cousin "Well, can you think of anything better?".
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,754
Likes
37,597
But, just to avoid any misunderstanding here: I'm not saying that your test is not valid in itself. The problem is rather with relying on these kind of tests for making absolute judgments about sound and audio equipment. Experimental psychological science (and psychoacoustics is a kind of psychology) has recently gone through a huge replication crisis. This means that a lot of the experiments that were done couldn't be replicated - meaning that the findings probably weren't real. Can we be sure that the situation is better in psychoacoustics?

I don't know about 'gone through a huge replication crisis'. I think it is a slowly evolving replication crisis. Along with replication something psychology and medicine could do to help is only accept 3 sigma results. Most physical sciences found out early on the low hanging fruit is gathered quickly and you get much more consistency requiring at least 3 sigma results. 2 sigma results should only indicate an area of interest deserving of further investigation.

No psycho-acoustics is not different. You can find contradictory results of fairly well planned and executed testing in regards to delayed reflections for instance. The tests are all slightly different though similar. Interpretation is difficult in that regard. At the same time don't let the perfect become the enemy of the good.

Now the situation with wire like many other psychoacoustic questions is corroboration via other knowledge. The way signals are carried by wire isn't some esoteric barely understood phenomena. Same for some aspects of hearing in that we know a good bit about the physical structure, function and nerve activity of our hearing systems with regard to physical stimulus. We know the cilia in the inner ear aren't made to respond past 15 khz. Filtering if you will that is not very sharp lets it responds although weakly to slightly higher frequencies at very high thresholds. So claiming we benefit from super high bandwidth makes no sense. Yet some claim it matters just like some say wire sounds different. Exhausting our knowledge of such things which isn't inconsiderable you are left with no other way to test the claim. We see no way such things could be audible, but do a nice blind test, if you can prove you do then you do. Overwhelmingly you get null results.

So as audiophiles sure we rarely can put together a truly rigorous blind test. But if there is little to no known reasons something would be audible and some decent attempts also show no results I think it pretty safe to assume there is nothing to it until someone claiming it can show otherwise. I wouldn't phrase it as an absolute judgement just a good enough for now judgement. One of those cases where you don't let the perfect be the enemy of the good.
 
OP
oivavoi

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,939
Location
Oslo, Norway
I don't know about 'gone through a huge replication crisis'. I think it is a slowly evolving replication crisis. Along with replication something psychology and medicine could do to help is only accept 3 sigma results. Most physical sciences found out early on the low hanging fruit is gathered quickly and you get much more consistency requiring at least 3 sigma results. 2 sigma results should only indicate an area of interest deserving of further investigation.

No psycho-acoustics is not different. You can find contradictory results of fairly well planned and executed testing in regards to delayed reflections for instance. The tests are all slightly different though similar. Interpretation is difficult in that regard. At the same time don't let the perfect become the enemy of the good.

Now the situation with wire like many other psychoacoustic questions is corroboration via other knowledge. The way signals are carried by wire isn't some esoteric barely understood phenomena. Same for some aspects of hearing in that we know a good bit about the physical structure, function and nerve activity of our hearing systems with regard to physical stimulus. We know the cilia in the inner ear aren't made to respond past 15 khz. Filtering if you will that is not very sharp lets it responds although weakly to slightly higher frequencies at very high thresholds. So claiming we benefit from super high bandwidth makes no sense. Yet some claim it matters just like some say wire sounds different. Exhausting our knowledge of such things which isn't inconsiderable you are left with no other way to test the claim. We see no way such things could be audible, but do a nice blind test, if you can prove you do then you do. Overwhelmingly you get null results.

So as audiophiles sure we rarely can put together a truly rigorous blind test. But if there is little to no known reasons something would be audible and some decent attempts also show no results I think it pretty safe to assume there is nothing to it until someone claiming it can show otherwise. I wouldn't phrase it as an absolute judgement just a good enough for now judgement. One of those cases where you don't let the perfect be the enemy of the good.

Good comment. The interesting and challenging question, I think, is in the areas where there are known objective differences in sound or performance, but which nevertheless often get negative results in blind tests. And tests about preferences. For example lossy vs lossless, high lossy vs lower lossy, good amps vs bad amps, etc. Or much reflections vs few reflections, and so on. How useful are blind tests here? Should we be guided by statistical means in preference testing, and/or buy gear which doesn't measure well but which often can't be distinguished from gear that measures well in abx tests? That's the thing I've been wondering about.

Btw: do you have any links to studies on reflections with opposing conclusions?
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,754
Likes
37,597
Good comment. The interesting and challenging question, I think, is in the areas where there are known objective differences in sound or performance, but which nevertheless often get negative results in blind tests. And tests about preferences. For example lossy vs lossless, high lossy vs lower lossy, good amps vs bad amps, etc. Or much reflections vs few reflections, and so on. How useful are blind tests here? Should we be guided by statistical means in preference testing, and/or buy gear which doesn't measure well but which often can't be distinguished from gear that measures well in abx tests? That's the thing I've been wondering about.

Btw: do you have any links to studies on reflections with opposing conclusions?

I'll see if I can find some of those reflection tests though I don't have them off hand. I find such things most easily doing Google Scholar searches on the topic. Those I remember differed in using speakers for the source in rooms or in semi-anechoic chambers or using simulated reflections over headphones. The main actual contradictions came from differing reflections between right and left sound sources. I know there are papers by Toole and Olive on it, but there are others as well naturally.

Preferences are a tricky thing. The industries with lots of experience are food and perfume industries. They make a fine distinction that audiophiles don't seem to grok so well. That being when testing for difference vs testing for preference. I have had differences of opinion when audiophiles want to be tested for preference when they haven't shown they can even perceive a difference. When preference testing is most effective it usually is reasonably focused. The preference question isn't which is better it is which is saltier, which is sweeter that sort of thing. Audiophiles claim cables sound better, but can't agree on how. You can test for more or less bass. Once you have established at what levels it can be discerned as different you could then test for preferred amounts. This much vs that much which amount of bass is preferred.

I understand the disdain or mistrust on preference testing as the MP3 vs lossless situation is an excellent example of how it can go awry. Nevertheless usually one sees some narrow range on discernible matters that the great majority will prefer. That is different from claiming it is superior fidelity as well. We all know a flat in room response is almost always (except for Ray :)) not preferred. A slightly downward sloping response is. I must say when I do my own recordings with only two mics at a bit of distance flat response isn't so bad. On closed miked material it surely is too bright. So is a preference for the slightly downsloping response a result of most music being close miked? Some revered recordings like early Mercury Living Presence made use of the Decca tree. These were large diaphragm omnis that became moderately directional at elevated frequencies. They also became uptitled in response at higher frequencies. Part of this was used to get better 'reach' into the rear of the orchestra. Yet digital remastering of those can sound bright on some good systems. Well they are a bit bright. So for commercial success and domestic no hassle bliss preference testing with most common music and most common listener preferences can be a big plus. It however it not necessarily about best fidelity.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
We all know a flat in room response is almost always (except for Ray :)) not preferred. A slightly downward sloping response is.
Do we know that? We only know it based on conventional systems, perhaps. Maybe people can only bear to listen to them with the treble shaved off.

There's always the possibility that there's a "circle of confusion" going on, but also that systems with incorrect phase, timing and too-shallow crossovers (i.e. standard audiophile systems) are grating on people's ears.
 

tomelex

Addicted to Fun and Learning
Forum Donor
Joined
Feb 29, 2016
Messages
990
Likes
572
Location
So called Midwest, USA
Dear o Dear, lets see, a sound wave has (and we convert that into an electrical wave)

amplitude
frequency
phase

think we can measure that by now Frank


And, lets agree that we should not expect plain old two channel stereo playback, an imperfect process of recording all the wavefronts created out in the ether by a bunch of monkeys banging on instruments to all the sudden provide a realistic rendition of all those wavefronts those monkeys banged out, its really that
simple, really.

 
Last edited:

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Ahhh, but what if I asked you to guarantee to me, with penalities, that the amplitude, frequency, phase were 100% faithful to the original, at all times?

A "perfect" rendition of monkey banging, no - a convincing rendition, yes ... wife runs into the room, "I heard a horde of monkeys going nuts, from outside, and I can still hear them carrying on, inside the room - where are they??!!"
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,754
Likes
37,597
Do we know that? We only know it based on conventional systems, perhaps. Maybe people can only bear to listen to them with the treble shaved off.

There's always the possibility that there's a "circle of confusion" going on, but also that systems with incorrect phase, timing and too-shallow crossovers (i.e. standard audiophile systems) are grating on people's ears.

It is not crossovers. I have heard and had the same preference as have others on the same music on speakers with no crossover, with shallow crossovers and with steep crossovers.

The phase I am not too keen on. Normal tests show people just aren't too sensitive to phase in the upper band. I have heard speakers with first order xovers and stressed tweeters sound ....well....stressed. You cannot roll off enough to stop that unless it loses enough treble to satisfy. While a gentler roll off is preferred though the character of stressed tweeter is still heard though reduced of course vs a flat response.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,754
Likes
37,597
If a short wire suffers from all these imperfections, then what must be going on in capacitors, resistors, transistors and so on?! The theory of operation of amplifiers would not work... DACs, preamps, buffers, power supplies, mixing desks, all trashed by their components' intrinsic impurities that philistine corporations have no clue about. Circuit boards laid out mainly for the convenience of fitting the components in a box would sound like nests of electrical vipers. Test equipment itself would be riddled with unknown impurities, meaning that mysterious distortions were compounded at every stage of the components' manufacture.

The chances of getting a system that worked tolerably would be astronomically low.

And yet somehow by just replace a few wires and applying some BlueTack, you manage to make kitchen radios sound like musicians playing in your living room.

Now, now, it is not a kitchen radio. It is a Philips HTIB.
 

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,321
Location
Albany Western Australia
Exhausting our knowledge of such things which isn't inconsiderable you are left with no other way to test the claim. We see no way such things could be audible, but do a nice blind test, if you can prove you do then you do. Overwhelmingly you get null results.

So as audiophiles sure we rarely can put together a truly rigorous blind test. But if there is little to no known reasons something would be audible and some decent attempts also show no results I think it pretty safe to assume there is nothing to it until someone claiming it can show otherwise. I wouldn't phrase it as an absolute judgement just a good enough for now judgement. One of those cases where you don't let the perfect be the enemy of the good.

this
 

Thomas savage

Grand Contributor
The Watchman
Forum Donor
Joined
Feb 24, 2016
Messages
10,260
Likes
16,305
Location
uk, taunton

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,250
Likes
17,187
Location
Riverview FL
The simple truth is that as soon as any sound is recorded through a mic, most of that sound is just gone.

Where did it go?
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,250
Likes
17,187
Location
Riverview FL
I'd think a good microphone does a good job at capturing a good sound.

I looked for microphone deficiencies, and didn't find much information pertinent to your claim.

Do you have something you can point to, or is this opinion?
 

Dynamix

Addicted to Fun and Learning
Joined
Mar 29, 2016
Messages
593
Likes
214
Location
Nörway
Do you have something you can point to, or is this opinion?

25 years of playing and recording music.

But really, just think about it: The sound of a close miced 22" bass drum is supposed to be captured by a (at best) 1.5" microphone membrane. Or an entire jazz band, drums, sax, bass, trumpet, piano, etc. Through a couple of mics? And that's supposed to capture the ambiance of the room and the full dynamics and every detail of what's being played? Not to mention an entire orchestra playing a classical piece. At 110+db?

Besides our speakers, the recording stage it self is the tightest bottleneck we face. My point is that we are kidding ourselves if we think that we can replicate a musical event on our systems, because the recording is so ridiculously compromised to begin with.

You can't bargain with the laws of physics.
 
Last edited:
Status
Not open for further replies.
Top Bottom