Actual Double Blind Studies

MAB · Jan 16, 2024

cavedriver said:
Yeah, I could build it, but could I get my local audio club to help sponsor me to set it up in a room at CAF? Maybe, maybe not. I think it would be a cool "attraction" at an event like that. There's always those rooms where you wonder what they are selling and why they even have the room in the first place. A room with a sign out front reading, "TEST YOUR AUDIOPHILE CHOPS HERE!" would be fun, and I wouldn't have to use currently hot new speaker designs, just things that are well known or have interesting behaviors. Could give anyone that gets a certain score a prize or something. I think a lot of people would be interested in the challenge. We think we know what we know and some are curious and brave enough to test it.

This would be a good club activity. I actually wish it was a larger part of testing, valid comparisons to look for differences, rather than the confusing 'speaker roundups' that often occur.
In the 'Blind Listening Test 2' post above, they did mention the need for larger sample size. Perhaps wives (especially extreme far-field), and pets (mostly near-field)... Just kidding.

Conventions (for me) are so much overload, I'm not sure I could calm myself down and listen critically in a study.

But if you have an active club, it seems for a few bucks and a few evenings of participation in a study you could get a result. I think you need to have a good hypothesis to test combined with a well-designed experiment to test the hypothesis, and decent stats skills to be able to understand the distribution of answers. Hard to test: is a speaker good. Easier to test: is speaker A audibly different than speaker B?

Justdafactsmaam · Jan 16, 2024

MAB said:
On page 78 is a quite famous test between some very well known high-end amplifiers published in Stereo Review:

https://americanradiohistory.com/Archive-HiFI-Stereo/80s/HiFi-Stereo-Review-1987-01.pdf
I worked in a store that sold Levinson amps at the time, to be honest I wasn't surprised.

But this test is a poster child for those who throw shade at DBTs. It was soooo poorly designed and executed. I’d love to know how an under powered OTL failed to be distinguished in a test driving an inefficient difficult load? That amp should have operating at over 100% THD with any loud passages. How was that not heard? When I read this report back when it was first published it actually lead me to think that maybe there really are inherent problems in DBTs. And the truth is there can be very big problems with poorly designed and executed DBTs. A data base of well done DBTs would be nice to have.

MAB · Jan 17, 2024

Justdafactsmaam said:
But this test is a poster child for those who throw shade at DBTs. It was soooo poorly designed and executed. I’d love to know how an under powered OTL failed to be distinguished in a test driving an inefficient difficult load? That amp should have operating at over 100% THD with any loud passages. How was that not heard? When I read this report back when it was first published it actually lead me to think that maybe there really are inherent problems in DBTs. And the truth is there can be very big problems with poorly designed and executed DBTs. A data base of well done DBTs would be nice to have.

You have your facts wrong.
1) The Futterman OTL are 80 watt monoblocks, quite extraordinary. The Levinson Levinson ML-11 is 50 watts per channel. The Pioneer is 45 watts per channel.
2) The MG-IIIa speakers are not difficult to drive, quite the opposite.
3) There is a list in an AES survey of valid studies, as well as studies that are flawed. That list is mentioned and discussed earlier in this thread:

Actual Double Blind Studies

I have always been curious about 'training' for these tests. I saw a visual test that used an aerial pic of a full Rose Bowl, and instantaneous ABX testing was used to compare two pics and seeing if people could notice the difference between them. The subjects couldn't do it. Then, the tester...

www.audiosciencereview.com

Justdafactsmaam · Jan 17, 2024

MAB said:
You have your facts wrong.
1) The Futterman OTL are 80 watt monoblocks, quite extraordinary. The Levinson Levinson ML-11 is 50 watts per channel. The Pioneer is 45 watts per channel.
2) The MG-IIIa speakers are not difficult to drive, quite the opposite.
3) There is a list in an AES survey of valid studies, as well as studies that are flawed. That list is mentioned and discussed earlier in this thread:

Actual Double Blind Studies

I have always been curious about 'training' for these tests. I saw a visual test that used an aerial pic of a full Rose Bowl, and instantaneous ABX testing was used to compare two pics and seeing if people could notice the difference between them. The subjects couldn't do it. Then, the tester...

www.audiosciencereview.com

Those Maggie’s were rated at 83--85 db sensitivity. Most speaker manufacturers exaggerate this spec. The Futterman’s rated power was 150 watts into16 ohms, 65 watts into 8 ohms and no spec offered for 4 ohms. The Maggie’s impedance is 3-4 ohms from top to bottom. And these ratings of 65 watts into 8 ohms conveniently make no mention of actual measured distortion.

Best case scenario the amps are going into serious distortion at 103 db. I’d bet these amps were audibly distorting big time around the mid 90s. That’s going to be quite audible with any dynamic source material at real world listening levels.

No way this amp goes undetected in a well designed test with those speakers.

MAB · Jan 17, 2024

Justdafactsmaam said:
Those Maggie’s were rated at 83--85 db sensitivity. Most speaker manufacturers exaggerate this spec. The Futterman’s rated power was 150 watts into16 ohms, 65 watts into 8 ohms and no spec offered for 4 ohms. The Maggie’s impedance is 3-4 ohms from top to bottom. And these ratings of 65 watts into 8 ohms conveniently make no mention of actual measured distortion.

Best case scenario the amps are going into serious distortion at 103 db. I’d bet these amps were audibly distorting big time around the mid 90s. That’s going to be quite audible with any dynamic source material at real world listening levels.

No way this amp goes undetected in a well designed test with those speakers.

Ok, 103dB on MG-IIIa. That's an entirely different test of the amps' driven to hard clipping and the speaker to distortion. And a test of the listener's ability to deal with hearing damage over the course of dozens of trails. The point is the MG-IIIa are purely resistive, which is easy to drive. And these tests aren't done at ear-splitting levels where the speaker itself is distorting.

Justdafactsmaam · Jan 17, 2024

MAB said:
Ok, 103dB on MG-IIIa. That's an entirely different test of the amps' driven to hard clipping and the speaker to distortion. And a test of the listener's ability to deal with hearing damage over the course of dozens of trails. The point is the MG-IIIa are purely resistive, which is easy to drive. And these tests aren't done at ear-splitting levels where the speaker itself is distorting.

103 dbs is quite commonly heard in loud passages played by orchestras in concert halls. Most live jazz had plenty of content in excess of 103 db. It’s ear damaging if it’s sustained. If one is looking for life like sound reproduction 103 db is a very low bar. And that’s the best case scenario for the Futterman OTL 1 and the Magnepan MG IIIas. Audible distortion creeping in around 95 db should be detectable in any meaningful test of an amplifier’s sound quality. Any test that misses it is a broken test.

MAB · Jan 17, 2024

Justdafactsmaam said:
103 dbs is quite commonly heard in loud passages played by orchestras in concert halls. Most live jazz had plenty of content in excess of 103 db. It’s ear damaging if it’s sustained. If one is looking for life like sound reproduction 103 db is a very low bar. And that’s the best case scenario for the Futterman OTL 1 and the Magnepan MG IIIas. Audible distortion creeping in around 95 db should be detectable in any meaningful test of an amplifier’s sound quality. Any test that misses it is a broken test.

I guess I am not sure what point your are trying to make.
Seems you want to do a test in a totally different range of SPL, moving the goal-post to a max SPL test on inefficient speakers. But that wasn't what the study was. You seem to suggest they did ear-bleeding volumes in these trials. Do you know something not published? Do you think you could get 25 listeners in a room for 45 mins to 2.5 hours per session and play music at 90 or 100+ dB? The paper would need to be titled "All amps produce the same amount of hearing loss." Now I understand your 100% HD comment earlier. If you want to go there, Carver and Bryston had great amps to drive a Magnepan (I sold them!) Perhaps not as loud as you suggest, depending on bass content.

Justdafactsmaam · Jan 17, 2024

MAB said:
I guess I am not sure what point your are trying to make.
Seems you want to do a test in a totally different range of SPL, moving the goal-post to a max SPL test on inefficient speakers. But that wasn't what the study was. You seem to suggest they did ear-bleeding volumes in these trials. Do you know something not published? Do you think you could get 25 listeners in a room for 45 mins to 2.5 hours per session and play music at 90 or 100+ dB? The paper would need to be titled "All amps produce the same amount of hearing loss." Now I understand your 100% HD comment earlier. If you want to go there, Carver and Bryston had great amps to drive a Magnepan (I sold them!) Perhaps not as loud as you suggest, depending on bass content.

The point is that set of DBTs failed to identify differences in amps that are both audible and meaningful. Which makes it both a failure as a DBT and an easy target for criticism of the use of DBTs. Not very good for promoting the value or reliability of well designed and well executed DBTs. You don’t get behind Piltdown man to argue in favor of evolution. You call it out for what it was and focus on the mountains of good legitimate evidence. It would be nice to do the same for audio.

ahofer · Jan 17, 2024

Justdafactsmaam said:
Any test that misses it is a broken test.

Begging the question. They couldn’t hear a difference so the test is broken?

The Piltdown analogy is ridiculous. The test certainly wasn’t a fraud in support of an imaginary (racist) idea.

The test conditions satisfied quite a number of people with different preconceptions. They used the equipment as they might use it at home, and they couldn’t tell the difference. That is meaningful for people who might buy the equipment and use it at home.

I listen to classical music all the time at home, and when I’ve had my decibel meter on I’ve observed peaks in the 80s, sometimes 90s. Never 103. I haven’t felt the need for more. I go to live concerts 1-2X per week on average.

Justdafactsmaam · Jan 17, 2024

ahofer said:
Begging the question. They couldn’t hear a difference so the test is broken?

The Piltdown analogy is ridiculous. The test certainly wasn’t a fraud in support of an imaginary (racist) idea.

The test conditions satisfied quite a number of people with different preconceptions. They used the equipment as they might use it at home, and they couldn’t tell the difference. That is meaningful for people who might buy the equipment and use it at home.

I listen to classical music all the time at home, and when I’ve had my decibel meter on I’ve observed peaks in the 80s, sometimes 90s. Never 103. I haven’t felt the need for more. I go to live concerts 1-2X per week on average.

Yes, when differences aren’t heard *where they should be heard* it’s a pretty good indicator that the test was flawed. Likewise when a test gives a positive result when it was pretty much ch impossible on paper it was probably a faulty test.

That the test conditions satisfied a number of participants with no experience in designing scientifically valid tests does not really mean much.

You’re listening to classical music at home has no bearing on real world SPLs in an actual concert hall.

“In a performance, the trumpet ranges between 80 and 110 decibels.

The trombone, however, peaks at around 115 decibels. Surprisingly, the clarinet is much the same, peaking at about 114 decibels.”

The Piltdown analogy is on point. You don’t argue in favor of scientifically valid evidence by citing examples that are the opposite of scientifically valid.

ahofer · Jan 17, 2024

Justdafactsmaam said:
Yes, when differences aren’t heard *where they should be heard* it’s a pretty good indicator that the test was flawed. Likewise when a test gives a positive result when it was pretty much ch impossible on paper it was probably a faulty test.

That the test conditions satisfied a number of participants with no experience in designing scientifically valid tests does not really mean much.

You’re listening to classical music at home has no bearing on real world SPLs in an actual concert hall.

“In a performance, the trumpet ranges between 80 and 110 decibels.

The trombone, however, peaks at around 115 decibels. Surprisingly, the clarinet is much the same, peaking at about 114 decibels.”

The Piltdown analogy is on point. You don’t argue in favor of scientifically valid evidence by citing examples that are the opposite of scientifically valid.

We are evaluating audio equipment for home use. I cite my concert experience to show that I'm familiar with concert volume (which tends to be lower in the seats than on the stage) and how I use audio equipment at home. Reproducing an orchestra at stage levels in one's living room is not generally a listener's goal. On the other hand, I often hear my wife play in small ensembles in my living room, and it is quite loud.

And no, you are still engaged in circular reasoning and special pleading ('it should be heard' is speculation), but I see there is no point, so off to ignore with you (mostly because it is tiresome for others to witness arguments like this).

MAB · Jan 17, 2024

Justdafactsmaam said:
Yes, when differences aren’t heard *where they should be heard* it’s a pretty good indicator that the test was flawed. Likewise when a test gives a positive result when it was pretty much ch impossible on paper it was probably a faulty test.

That the test conditions satisfied a number of participants with no experience in designing scientifically valid tests does not really mean much.

You’re listening to classical music at home has no bearing on real world SPLs in an actual concert hall.

“In a performance, the trumpet ranges between 80 and 110 decibels.

The trombone, however, peaks at around 115 decibels. Surprisingly, the clarinet is much the same, peaking at about 114 decibels.”

The Piltdown analogy is on point. You don’t argue in favor of scientifically valid evidence by citing examples that are the opposite of scientifically valid.

You still haven't explained the three facts you got wrong. In fact, your argument is quite the opposite on the Futterman.
Quite the bombastic aside though. Probably why you feel you need 100+ dB to tell the difference between your yelling and your screaming!
I will leave you to your 100dB ranting.

Jolly Joker · Jan 17, 2024

Newbie question here. Can't you just use DSP and measurement microphones at the listening position to get useful data on people's preferences and ability to distinguish such-and-such differences in a way that can be reproduced in different locations, without caring about speaker swapping and such?

Roland · Jan 17, 2024

Am I correct in my understanding that the scientific evidence tells us unequivocally that there will be no audible difference between a hifi system that uses a Yamaha A-S3200 for amplification and one that uses a Yamaha A-S701 (using phono inputs)?

ahofer · Jan 17, 2024

Roland said:
Am I correct in my understanding that the scientific evidence tells us unequivocally that there will be no audible difference between a hifi system that uses a Yamaha A-S3200 for amplification and one that uses a Yamaha A-S701 (using phono inputs)?

Scientific testing evidence speaks to relative few specific amplifiers that have been used in blind tests. Measurements suggest that most amplification has inaudible differences, and the ones you can detect relate to corner cases - unusual loads that alter FR. When you combine what we know about thresholds of audibility and what we can measure in amplifier output and speaker loads, it is very reasonable to infer that amplifiers that measure within the same tolerances (the vast majority) will sound the same when compared unsighted.

However, RIAA equalization in phono stages can vary widely from one unit to the next. So I would bet the amps sound the same after the phono stage. If we could see phono stage measurements, then I might bet on that outcome.

cavedriver · Jan 17, 2024

Jolly Joker said:
Newbie question here. Can't you just use DSP and measurement microphones at the listening position to get useful data on people's preferences and ability to distinguish such-and-such differences in a way that can be reproduced in different locations, without caring about speaker swapping and such?

You'll see there's a lot of discussion of directivity in this thread. DSP can be used to modulate frequency response and delay times, but not the directivity of the speaker. Speakers with very wide directivity will have very different first reflections in a room from speakers with very narrow directivity that can't be modified by changing the speaker's output using DSP (with the exception of speakers like the new Genelec's that have been mentioned).

MAB · Jan 17, 2024

Roland said:
Am I correct in my understanding that the scientific evidence tells us unequivocally that there will be no audible difference between a hifi system that uses a Yamaha A-S3200 for amplification and one that uses a Yamaha A-S701 (using phono inputs)?

It does.

Actual Double Blind Studies

I have always been curious about 'training' for these tests. I saw a visual test that used an aerial pic of a full Rose Bowl, and instantaneous ABX testing was used to compare two pics and seeing if people could notice the difference between them. The subjects couldn't do it. Then, the tester...

www.audiosciencereview.com

AES E-Library » The Great Debate: Is Anyone Winning?

In 1980 Dan Shanefield and High Fidelity magazine startled the American audio community with a double blind amplifier comparison test in which listeners failed to identify power amplifiers by sound alone. Battle lines were quickly drawn and the controversy over whether amplifiers sounded...

www.aes.org

The examples where reported differences where heard are on actually flawed tests (comparing a 10W vs. 400W amp, audible differences in a misbiased tube amp, using an oscillating tube amp in a study, HiFi and Record Review misapplication of statistics, and Stereophile both misinterpreting statistics, and testing systems with mismatched frequency response.

I can hear some poorly implemented phono preamps by the noise floor. I note some RIAA implementations have significant frequency response deviations that are audible. So do many elements of turntable pickup.
I differentiate some amps by the noise floor with high efficiency speakers, even if sometimes I need to get unrealistically close to the driver to hear.
I can hear frequency response variations when the amplifier output impedance is high and the speaker is low impedance. I don't own amps with high output impedance, and I don't own Scintillas or Infinity Kappas either!
I can tell some amplifiers apart by they sound they make when clipping, but only on certain tracks. Often helps me to listen for this off-axis. Same for distortion. I don't tend to run amps anywhere near clipping.

I think most people can identify these above artifacts under reasonably controlled tests, if they care.

That being said, I tend to have gear that is reasonable and has none of the above artifacts. I've seldom enjoyed the odd products with corner-case specs (like Scintillas), and find modifying my FR with a high impedance tube amp to be silly. I do have two speakers that require low noise amps due to efficiency though. I don't worry about my phono, it has sounded like a nice turntable for many decades.

edit: typo

Jolly Joker · Jan 17, 2024

cavedriver said:
You'll see there's a lot of discussion of directivity in this thread.

Can you give me a pointer? The word "directivity" itself isn't mentioned that I can find

ahofer · Jan 18, 2024

Jolly Joker said:
Can you give me a pointer? The word "directivity" itself isn't mentioned that I can find

It refers to the sound pressure at various angles to the speaker, and how that characteristic progresses through the frequency range. It’s pretty widely understood that gradual or smooth changes in directivity are preferred to abrupt changes, which cause incongruity between direct and reflected sound. You should probably start here:

Thread 'Understanding Speaker Measurements (Video)'
https://audiosciencereview.com/foru...derstanding-speaker-measurements-video.44101/

tp1 · Jan 18, 2024

I have an issue with blind listening tests although I'm not sure how to solve it. The issue is similar to the reason why Pepsi won the blind test challenge with Coca Cola. Pepsi taste is a touch sweeter than Coke and with a random blind test many people preferred the sweeter taste . However in the long term, not everyone can sustain a preference for the sweeter taste.

In blind listening tests that I have been involved with with a group of amateur enthusiasts like myself, most people on the relevant days preferred a smoother sound (someone called it sweeter - hence the analogy) . To my mind that sound wasn't always the most accurate nor the most revealing but pleasant nevertheless. I guess the issue here not everyone uses the same criteria to judge sound

Next there is the issue of technique which can cause real problems - whether it is intended or not. That is the case when 2 sound sources are synchronised and the operator of the test switches between the two sources that plays the same passage of music. My problem with this technique is that the listener does not compare the same music each time the switch is made making the test somewhat ineffective. In fact, the smooth flow of the rhythm of the music as the switch is made can be distracting from the real purpose of the test.

Actual Double Blind Studies

Major Contributor

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Major Contributor

Addicted to Fun and Learning

Master Contributor

Addicted to Fun and Learning

Master Contributor

Major Contributor

Member

Active Member

Master Contributor

Addicted to Fun and Learning

Major Contributor

Member

Master Contributor

Member

Similar threads