Limitations of blind testing procedures

oivavoi · Jan 18, 2017

Jinjuku said:
That's why the protocol is open and public.

But, just to avoid any misunderstanding here: I'm not saying that your test is not valid in itself. The problem is rather with relying on these kind of tests for making absolute judgments about sound and audio equipment. Experimental psychological science (and psychoacoustics is a kind of psychology) has recently gone through a huge replication crisis. This means that a lot of the experiments that were done couldn't be replicated - meaning that the findings probably weren't real. Can we be sure that the situation is better in psychoacoustics?

Jinjuku · Jan 18, 2017

oivavoi said:
But, just to avoid any misunderstanding here: I'm not saying that your test is not valid in itself. The problem is rather with relying on these kind of tests for making absolute judgments about sound and audio equipment. Experimental psychological science (and psychoacoustics is a kind of psychology) has recently gone through a huge replication crisis. This means that a lot of the experiments that were done couldn't be replicated - meaning that the findings probably weren't real. Can we be sure that the situation is better in psychoacoustics?

Agreed. If others can't reliably reproduce the testing independently, if the testing isn't open for review, criticism, feed-back, and revision then that's a problem.

What I'm saying is that I have developed two bias controlled tests now that has pretty much been unassailable when I have laid out the protocol and apparatus. It's the best way I can think of. It doesn't mean someone can think of it in a better way. I'm happy to learn.

Cosmik · Jan 18, 2017

oivavoi said:
But, just to avoid any misunderstanding here: I'm not saying that your test is not valid in itself. The problem is rather with relying on these kind of tests for making absolute judgments about sound and audio equipment. Experimental psychological science (and psychoacoustics is a kind of psychology) has recently gone through a huge replication crisis. This means that a lot of the experiments that were done couldn't be replicated - meaning that the findings probably weren't real. Can we be sure that the situation is better in psychoacoustics?

Even if they can be replicated following the original procedure, it doesn't mean that they were real:

Benveniste and his team of researchers followed the original study's procedure and produced results similar to those of the first published data. Maddox, however, noted that during the procedure the experimenters were aware of which test tubes originally contained the antibodies and which did not. Benveniste's team then started a second, blinded experimental series... The blinded experimental series showed no water memory effect.

And as I pointed out in another thread, mixing science with aesthetics gives meaningless results. Case in point: does MP3 encoding destroy the 'emotion' of musical sounds? There is evidence that people of a certain age (commonly found in universities) prefer the sound of MP3 over uncompressed! If you ran an experiment that showed that MP3 is 'better' than uncompressed, you would most likely think your methodology was faulty, but in fact it would just be human unpredictability and cultural fickleness rendering your experiment and its interpretation meaningless. Repeat it in ten years' time and you might get the opposite, equally reliable and statistically significant result. The difference would be that this time you would 'believe' your results and would publish them. People might even be able to replicate them until the next change in musical fashion.

The mistake is to believe that science has an answer for everything. Or its close cousin "Well, can you think of anything better?".

Blumlein 88 · Jan 18, 2017

oivavoi said:
But, just to avoid any misunderstanding here: I'm not saying that your test is not valid in itself. The problem is rather with relying on these kind of tests for making absolute judgments about sound and audio equipment. Experimental psychological science (and psychoacoustics is a kind of psychology) has recently gone through a huge replication crisis. This means that a lot of the experiments that were done couldn't be replicated - meaning that the findings probably weren't real. Can we be sure that the situation is better in psychoacoustics?

I don't know about 'gone through a huge replication crisis'. I think it is a slowly evolving replication crisis. Along with replication something psychology and medicine could do to help is only accept 3 sigma results. Most physical sciences found out early on the low hanging fruit is gathered quickly and you get much more consistency requiring at least 3 sigma results. 2 sigma results should only indicate an area of interest deserving of further investigation.

No psycho-acoustics is not different. You can find contradictory results of fairly well planned and executed testing in regards to delayed reflections for instance. The tests are all slightly different though similar. Interpretation is difficult in that regard. At the same time don't let the perfect become the enemy of the good.

Now the situation with wire like many other psychoacoustic questions is corroboration via other knowledge. The way signals are carried by wire isn't some esoteric barely understood phenomena. Same for some aspects of hearing in that we know a good bit about the physical structure, function and nerve activity of our hearing systems with regard to physical stimulus. We know the cilia in the inner ear aren't made to respond past 15 khz. Filtering if you will that is not very sharp lets it responds although weakly to slightly higher frequencies at very high thresholds. So claiming we benefit from super high bandwidth makes no sense. Yet some claim it matters just like some say wire sounds different. Exhausting our knowledge of such things which isn't inconsiderable you are left with no other way to test the claim. We see no way such things could be audible, but do a nice blind test, if you can prove you do then you do. Overwhelmingly you get null results.

So as audiophiles sure we rarely can put together a truly rigorous blind test. But if there is little to no known reasons something would be audible and some decent attempts also show no results I think it pretty safe to assume there is nothing to it until someone claiming it can show otherwise. I wouldn't phrase it as an absolute judgement just a good enough for now judgement. One of those cases where you don't let the perfect be the enemy of the good.

oivavoi · Jan 18, 2017

Blumlein 88 said:
I don't know about 'gone through a huge replication crisis'. I think it is a slowly evolving replication crisis. Along with replication something psychology and medicine could do to help is only accept 3 sigma results. Most physical sciences found out early on the low hanging fruit is gathered quickly and you get much more consistency requiring at least 3 sigma results. 2 sigma results should only indicate an area of interest deserving of further investigation.

No psycho-acoustics is not different. You can find contradictory results of fairly well planned and executed testing in regards to delayed reflections for instance. The tests are all slightly different though similar. Interpretation is difficult in that regard. At the same time don't let the perfect become the enemy of the good.

Now the situation with wire like many other psychoacoustic questions is corroboration via other knowledge. The way signals are carried by wire isn't some esoteric barely understood phenomena. Same for some aspects of hearing in that we know a good bit about the physical structure, function and nerve activity of our hearing systems with regard to physical stimulus. We know the cilia in the inner ear aren't made to respond past 15 khz. Filtering if you will that is not very sharp lets it responds although weakly to slightly higher frequencies at very high thresholds. So claiming we benefit from super high bandwidth makes no sense. Yet some claim it matters just like some say wire sounds different. Exhausting our knowledge of such things which isn't inconsiderable you are left with no other way to test the claim. We see no way such things could be audible, but do a nice blind test, if you can prove you do then you do. Overwhelmingly you get null results.

So as audiophiles sure we rarely can put together a truly rigorous blind test. But if there is little to no known reasons something would be audible and some decent attempts also show no results I think it pretty safe to assume there is nothing to it until someone claiming it can show otherwise. I wouldn't phrase it as an absolute judgement just a good enough for now judgement. One of those cases where you don't let the perfect be the enemy of the good.

Good comment. The interesting and challenging question, I think, is in the areas where there are known objective differences in sound or performance, but which nevertheless often get negative results in blind tests. And tests about preferences. For example lossy vs lossless, high lossy vs lower lossy, good amps vs bad amps, etc. Or much reflections vs few reflections, and so on. How useful are blind tests here? Should we be guided by statistical means in preference testing, and/or buy gear which doesn't measure well but which often can't be distinguished from gear that measures well in abx tests? That's the thing I've been wondering about.

Btw: do you have any links to studies on reflections with opposing conclusions?

Blumlein 88 · Jan 18, 2017

oivavoi said:
Good comment. The interesting and challenging question, I think, is in the areas where there are known objective differences in sound or performance, but which nevertheless often get negative results in blind tests. And tests about preferences. For example lossy vs lossless, high lossy vs lower lossy, good amps vs bad amps, etc. Or much reflections vs few reflections, and so on. How useful are blind tests here? Should we be guided by statistical means in preference testing, and/or buy gear which doesn't measure well but which often can't be distinguished from gear that measures well in abx tests? That's the thing I've been wondering about.

Btw: do you have any links to studies on reflections with opposing conclusions?

I'll see if I can find some of those reflection tests though I don't have them off hand. I find such things most easily doing Google Scholar searches on the topic. Those I remember differed in using speakers for the source in rooms or in semi-anechoic chambers or using simulated reflections over headphones. The main actual contradictions came from differing reflections between right and left sound sources. I know there are papers by Toole and Olive on it, but there are others as well naturally.

Preferences are a tricky thing. The industries with lots of experience are food and perfume industries. They make a fine distinction that audiophiles don't seem to grok so well. That being when testing for difference vs testing for preference. I have had differences of opinion when audiophiles want to be tested for preference when they haven't shown they can even perceive a difference. When preference testing is most effective it usually is reasonably focused. The preference question isn't which is better it is which is saltier, which is sweeter that sort of thing. Audiophiles claim cables sound better, but can't agree on how. You can test for more or less bass. Once you have established at what levels it can be discerned as different you could then test for preferred amounts. This much vs that much which amount of bass is preferred.

I understand the disdain or mistrust on preference testing as the MP3 vs lossless situation is an excellent example of how it can go awry. Nevertheless usually one sees some narrow range on discernible matters that the great majority will prefer. That is different from claiming it is superior fidelity as well. We all know a flat in room response is almost always (except for Ray

) not preferred. A slightly downward sloping response is. I must say when I do my own recordings with only two mics at a bit of distance flat response isn't so bad. On closed miked material it surely is too bright. So is a preference for the slightly downsloping response a result of most music being close miked? Some revered recordings like early Mercury Living Presence made use of the Decca tree. These were large diaphragm omnis that became moderately directional at elevated frequencies. They also became uptitled in response at higher frequencies. Part of this was used to get better 'reach' into the rear of the orchestra. Yet digital remastering of those can sound bright on some good systems. Well they are a bit bright. So for commercial success and domestic no hassle bliss preference testing with most common music and most common listener preferences can be a big plus. It however it not necessarily about best fidelity.

Cosmik · Jan 19, 2017

Blumlein 88 said:
We all know a flat in room response is almost always (except for Ray ) not preferred. A slightly downward sloping response is.

Do we know that? We only know it based on conventional systems, perhaps. Maybe people can only bear to listen to them with the treble shaved off.

There's always the possibility that there's a "circle of confusion" going on, but also that systems with incorrect phase, timing and too-shallow crossovers (i.e. standard audiophile systems) are grating on people's ears.

tomelex · Jan 19, 2017

Dear o Dear, lets see, a sound wave has (and we convert that into an electrical wave)

amplitude
frequency
phase

think we can measure that by now Frank

And, lets agree that we should not expect plain old two channel stereo playback, an imperfect process of recording all the wavefronts created out in the ether by a bunch of monkeys banging on instruments to all the sudden provide a realistic rendition of all those wavefronts those monkeys banged out, its really that
simple, really.

fas42 · Jan 19, 2017

Ahhh, but what if I asked you to guarantee to me, with penalities, that the amplitude, frequency, phase were 100% faithful to the original, at all times?

A "perfect" rendition of monkey banging, no - a convincing rendition, yes ... wife runs into the room, "I heard a horde of monkeys going nuts, from outside, and I can still hear them carrying on, inside the room - where are they??!!"

Blumlein 88 · Jan 19, 2017

Cosmik said:
Do we know that? We only know it based on conventional systems, perhaps. Maybe people can only bear to listen to them with the treble shaved off.

There's always the possibility that there's a "circle of confusion" going on, but also that systems with incorrect phase, timing and too-shallow crossovers (i.e. standard audiophile systems) are grating on people's ears.

It is not crossovers. I have heard and had the same preference as have others on the same music on speakers with no crossover, with shallow crossovers and with steep crossovers.

The phase I am not too keen on. Normal tests show people just aren't too sensitive to phase in the upper band. I have heard speakers with first order xovers and stressed tweeters sound ....well....stressed. You cannot roll off enough to stop that unless it loses enough treble to satisfy. While a gentler roll off is preferred though the character of stressed tweeter is still heard though reduced of course vs a flat response.

watchnerd · Jan 19, 2017

fas42 said:
Ahhh, but what if I asked you to guarantee to me, with penalities, that the amplitude, frequency, phase were 100% faithful to the original, at all times?

No, never.

Because microphones.

fas42 · Jan 19, 2017

watchnerd said:
No, never.

Because microphones.

It's let's be kind to Tom week - I'll accept 100% faithful to the the media data ...

Blumlein 88 · Jan 19, 2017

Cosmik said:
If a short wire suffers from all these imperfections, then what must be going on in capacitors, resistors, transistors and so on?! The theory of operation of amplifiers would not work... DACs, preamps, buffers, power supplies, mixing desks, all trashed by their components' intrinsic impurities that philistine corporations have no clue about. Circuit boards laid out mainly for the convenience of fitting the components in a box would sound like nests of electrical vipers. Test equipment itself would be riddled with unknown impurities, meaning that mysterious distortions were compounded at every stage of the components' manufacture.

The chances of getting a system that worked tolerably would be astronomically low.

And yet somehow by just replace a few wires and applying some BlueTack, you manage to make kitchen radios sound like musicians playing in your living room.

Now, now, it is not a kitchen radio. It is a Philips HTIB.

March Audio · Jan 19, 2017

Blumlein 88 said:
Exhausting our knowledge of such things which isn't inconsiderable you are left with no other way to test the claim. We see no way such things could be audible, but do a nice blind test, if you can prove you do then you do. Overwhelmingly you get null results.

So as audiophiles sure we rarely can put together a truly rigorous blind test. But if there is little to no known reasons something would be audible and some decent attempts also show no results I think it pretty safe to assume there is nothing to it until someone claiming it can show otherwise. I wouldn't phrase it as an absolute judgement just a good enough for now judgement. One of those cases where you don't let the perfect be the enemy of the good.

this

Thomas savage · Jan 19, 2017

Moderation

A few posts have been moved out of this thread and over to http://audiosciencereview.com/forum...d-known-understanding.1136/page-22#post-32175

I'm not keen to have threads derailed by that type of discussion ( back and forth with Frank about blu tack ,mystical distortions and thier removal though fantastical audio witchcraft) that's why the thread in FC was started

Cheers

Dynamix · Jan 26, 2017

watchnerd said:
No, never.

Because microphones.

This is exactly the point that so many audiophiles seem to be unable to grasp. The simple truth is that as soon as any sound is recorded through a mic, most of that sound is just gone.

RayDunzl · Jan 26, 2017

Dynamix said:
The simple truth is that as soon as any sound is recorded through a mic, most of that sound is just gone.

Where did it go?

Dynamix · Jan 26, 2017

RayDunzl said:
Where did it go?

I think it followed Alice down the rabbit hole?

RayDunzl · Jan 26, 2017

I'd think a good microphone does a good job at capturing a good sound.

I looked for microphone deficiencies, and didn't find much information pertinent to your claim.

Do you have something you can point to, or is this opinion?

Dynamix · Jan 26, 2017

RayDunzl said:
Do you have something you can point to, or is this opinion?

25 years of playing and recording music.

But really, just think about it: The sound of a close miced 22" bass drum is supposed to be captured by a (at best) 1.5" microphone membrane. Or an entire jazz band, drums, sax, bass, trumpet, piano, etc. Through a couple of mics? And that's supposed to capture the ambiance of the room and the full dynamics and every detail of what's being played? Not to mention an entire orchestra playing a classical piece. At 110+db?

Besides our speakers, the recording stage it self is the tightest bottleneck we face. My point is that we are kidding ourselves if we think that we can replicate a musical event on our systems, because the recording is so ridiculously compromised to begin with.

You can't bargain with the laws of physics.

Limitations of blind testing procedures

Major Contributor

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Grand Contributor

Major Contributor

Addicted to Fun and Learning

Major Contributor

Grand Contributor

Grand Contributor

Major Contributor

Grand Contributor

Master Contributor

Grand Contributor

Addicted to Fun and Learning

Grand Contributor

Addicted to Fun and Learning

Grand Contributor

Addicted to Fun and Learning

Similar threads