• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Can You Trust Your Ears? By Tom Nousaine

Status
Not open for further replies.

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
And to make sure everyone is following the argument, here is a sample from HA Forum link above:

View attachment 9423


These are "MOS" (mean opinion score) tests where users rate fidelity on scale of 1 to 5. There is no ABX test that says whether a difference exists at all.

This is the style of testing that is used in development of lossy codecs. Not ABX as I have mentioned repeatedly.

That appears to be MUSHRA test methodology.

Sensitive tests use DIFFERENCE testing, not MOS testing, for non-transparent systems.

In MUSHRA, you'll find most of the systems that aren't at the very top would get completely blasted to bits by ABC/HR, and provide 100% ABX scores.

Not a fan of MUSHRA - Nope.

It is also possible that the test you pointed out is the results of many separate tests, each one using CCIR impariment scale, which is a difference scale, not an MOS scale.

Hard to tell without more information.

But ABX is what you use if you want to determine if there is ANY audible difference at all. I suspect a typo up in the quote.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
Arny said development of codecs uses ABX. I said it did not. .

Internally, actually, we used ABX as well as ABC/HR (no MUSHRA), in other words, 'detection' or "distance" testing but no MOS testing in developing codecs. It depends on what you're trying to accomplish, and sometimes "is this detectable" is what you need to know inside the guts of a codec. It's kind of complicated.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,639
Likes
240,750
Location
Seattle Area
Internally, actually, we used ABX as well as ABC/HR (no MUSHRA), in other words, 'detection' or "distance" testing but no MOS testing in developing codecs. It depends on what you're trying to accomplish, and sometimes "is this detectable" is what you need to know inside the guts of a codec. It's kind of complicated.
When you say you did do you mean AT&T? Inside Microsoft and in countless external codec shoot outs I participated in, it was never ABX. And MOS scoring was quite common in many tests.

I also don't remember any AES papers of lossy codecs using ABX testing. Admittedly it has been a while. I just went back and did a search and first few are all non-ABX tests: http://www.aes.org/tmpFiles/elib/20171026/8367.pdf

upload_2017-10-26_12-48-27.png


http://www.aes.org/e-lib/browse.cfm?elib=11262
upload_2017-10-26_12-50-29.png



http://www.aes.org/e-lib/browse.cfm?elib=5396

upload_2017-10-26_12-51-44.png



http://www.aes.org/e-lib/browse.cfm?elib=7127

upload_2017-10-26_12-54-2.png


upload_2017-10-26_12-55-14.png


BS1116 is what I mentioned was key specification for such testing and it doesn't have anything to do with ABX.

I will stop here. The message is quite clear that ABX is not a common test in development of lossy audio codecs. While this doesn't rule out people using it, it just isn't the method of choice as Arny said.
 

DonH56

Master Contributor
Technical Expert
Forum Donor
Joined
Mar 15, 2016
Messages
7,890
Likes
16,692
Location
Monument, CO
For those (like me) who have forgotten how MUSHRA works: https://en.wikipedia.org/wiki/MUSHRA

I was never professionally involved with audio codec testing but my memory as a "follower" is that MUSHRA was fairly extensively used, at least publically and in marketing, to differentiate among various lossy codecs. E.g. how few bits could they get away with and still be acceptable. I.e. how we got into this mess of highly lossy music that persists even though storage and network bandwidth has vastly improved since then.

My WAG's - Don
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
When you say you did do you mean AT&T? Inside Microsoft and in countless external codec shoot outs I participated in, it was never ABX. And MOS scoring was quite common in many tests.

I also don't remember any AES papers of lossy codecs using ABX testing. Admittedly it has been a while. I just went back and did a search and first few are all non-ABX tests: http://www.aes.org/tmpFiles/elib/20171026/8367.pdf


BS1116 is what I mentioned was key specification for such testing and it doesn't have anything to do with ABX.

I will stop here. The message is quite clear that ABX is not a common test in development of lossy audio codecs. While this doesn't rule out people using it, it just isn't the method of choice as Arny said.

You forget. I didn't work on any codecs at MS. So yes, AT&T. One does not usually write papers about research testing, unless it's necessary to document an outcome.

BS1116 and .1 are DIFFERENCE tests, not MOS tests. This is an important distinction, really.

MUSHRA is a modified MOS test.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
Indeed, some of which attempt to force a listener to summarize a multidimensional perception in one value, which I strongly object to.

Some people even wonder why in preference tests involving multidimensional evaluation even transitivity isn´t necessarily a given, so i share the objection.
 

Ron Party

Senior Member
CPH (Chief Prog Head)
Joined
Feb 24, 2016
Messages
415
Likes
573
Location
Oakland
Apoclypse in 9/8 :)

At this point, the drums enter, with the rhythm section striking out a pattern using the unusual metre of 9 beats to the bar (expressed as 3+2+4).[7] The lyrics employ stereotypical apocalyptic imagery, alternating with an organ solo from Banks (played in 4/4 and 7/8 time signatures against the 9/8 rhythm section)

My favorite song of all time, bar none.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Just had a quick glance at the BS.1116-3 document - right up my alley! Magic word there - "impairment" ... the goal, Grade 5.0 - Imperceptible ... 99.9999..% of playback systems fail, badly, in this regard - use of the 'right' test signal, say a musical piece which is a particularly complex mix of sound elements, with high dynamic range, played at sufficiently high volume, will trigger a perceptible impairment of the potentially realisable quality ... every time.

As long as a system can be trivially tripped up by this type of "testing" it doesn't deserve to be considered, high fidelity ...
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
Just had a quick glance at the BS.1116-3 document - right up my alley! Magic word there - "impairment" ... the goal, Grade 5.0 - Imperceptible ... 99.9999..% of playback systems fail, badly, in this regard - use of the 'right' test signal, say a musical piece which is a particularly complex mix of sound elements, with high dynamic range, played at sufficiently high volume, will trigger a perceptible impairment of the potentially realisable quality ... every time.

As long as a system can be trivially tripped up by this type of "testing" it doesn't deserve to be considered, high fidelity ...

I think you need to read BS1116. It's obvious from your definition of "fail" that you don't understand what the test is, how it works, or what it reports back.

What is your hidden reference for this test you refer to?
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
I think you need to read BS1116. It's obvious from your definition of "fail" that you don't understand what the test is, how it works, or what it reports back.

What is your hidden reference for this test you refer to?
Fair enough to fully read BS1116 - the intent is clear, to me, that distortion is deliberately, knowingly included, at certain levels - perhaps from data compression - until the subjects are able to detect impairment - I just apply this in the greater context, when listening to systems playing back "uncontaminated" material.

My reference for this is having heard some clip, or track, being played back with minimal impairment at some point - I know how well it was recorded. Example: a rock track where the drummer is constantly striking the cymbals through the piece; Grade 0 - "Is the drummer hitting the cymbals, I can't hear it?" through "Kitchen pots and pans being whacked" to Grade 5 - the delicate, clear shimmer of the instrument is reproduced, fully intact.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Fair enough to fully read BS1116 - the intent is clear, to me, that distortion is deliberately, knowingly included, at certain levels - perhaps from data compression - until the subjects are able to detect impairment - I just apply this in the greater context, when listening to systems playing back "uncontaminated" material.

My reference for this is having heard some clip, or track, being played back with minimal impairment at some point - I know how well it was recorded. Example: a rock track where the drummer is constantly striking the cymbals through the piece; Grade 0 - "Is the drummer hitting the cymbals, I can't hear it?" through "Kitchen pots and pans being whacked" to Grade 5 - the delicate, clear shimmer of the instrument is reproduced, fully intact.
Couldn't you just... measure your system? I can tell from this thread that listening tests are regarded by many as the Swiss Army knife of measurements. People become listening test enthusiasts, because they think they can resolve "multidimensional" mysteries such as why people seem to like the sound of valve amplifiers and vinyl over the perfection of digital and solid state. Or they can prove "their opponents" ('subjective' audiophiles) wrong - a very strong motivation for the scientific listening test 'community'. But of course, it may be nothing to do with the sound in the first place. In effect, the listening test enthusiasts have already fallen for the subjective audiophiles' schtick by even dignifying their claims with a test.

Lossy compression is one area where listening tests are needed to check it works - which is why it is so often raised as an example of the usefulness of listening tests - but even so, the test can only fine tune a system that was designed on paper based on pretty straightforward logic (not saying I'd have thought of it, though!). A workable lossy compression system could be designed without any listening tests at all.

At the end of the day, it is odd that lossy compression seems to be such an obsession for high performance audio professionals. It is as though they are stuck in the year 2000.
 
Last edited:

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
Couldn't you just... measure your system? I can tell from this thread that listening tests are regarded by many as the Swiss Army knife of measurements.

Isn´t it obvious? Which way - without listening tests - could somebody decide which sort of measurement is important and which degree (or kind) of deviation from a predefined "perfect" measurement result could be nevertheless acceptable?

The facts/premises should stand firm:
1.) a stereo reproduction system (for example a two channel system) represents a great deviation from the original soundfield, when reproducing it, but our ear/brain tag team is able to compensate for a lot of these deviations
2.) listeners to these reproduction systems are complex individuals that might repond in quite a different way to these deviations
3.) listening to music is a multidimensional experience
4.) different deviations/errors can lead to the same percepted impression

1.) - 4.) together are imo the reason why the aforementioned transitivity isn´t garantueed in listening tests.

So, if you accept the necessity in the case of "lossy codecs" why don´t you accept the necessity in general, as reproduction usually is just a "lossy version" of an original?
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Isn´t it obvious? Which way - without listening tests - could somebody decide which sort of measurement is important and which degree (or kind) of deviation from a predefined "perfect" measurement result could be nevertheless acceptable?

The facts/premises should stand firm:
1.) a stereo reproduction system (for example a two channel system) represents a great deviation from the original soundfield, when reproducing it, but our ear/brain tag team is able to compensate for a lot of these deviations
2.) listeners to these reproduction systems are complex individuals that might repond in quite a different way to these deviations
3.) listening to music is a multidimensional experience
4.) different deviations/errors can lead to the same percepted impression

1.) - 4.) together are imo the reason why the aforementioned transitivity isn´t garantueed in listening tests.

So, if you accept the necessity in the case of "lossy codecs" why don´t you accept the necessity in general, as reproduction usually is just a "lossy version" of an original?
What you are advocating is the system designed by listening test. This is very similar to the idea of training artificial neural networks.

The ANN can be regarded as a black box that will transform inputs to outputs, and can be 'trained' to implement any arbitrary multidimensional function. Really, the sky's the limit: you could create your dream system that will transform your recordings in exactly the way you like them best. All you have to do is to train the network to give you the sound you like.... But at that moment, you realise that there is no such thing as "the sound you like", because it varies. And the job of training the network from scratch would be immense. Instead, you would set the network up to at least start from 'linear' and then fine tune it. What are the chances, do you think, that on average you would have a strong preference for anything but 'linear' at the end of the day? At the same time it is easy to see how a bias in the training examples could completely screw the thing up.

Design by listening test is, effectively, a version of training a neural network, but extremely slowly and based on very sparse 'training data'. People who work in neural nets usually start out very enthusiastic but then eventually come to understand that the network is reflecting their choice of training & testing data rather than adding new insights. In order to get an effective network they need to be able to provide an even distribution of training & testing data, and in order to ensure this, they need to understand the essence of the data. If they do this, they may as well engineer the system using conventional methods!

In audio, the current state of the art (as it always has been) is the linear system. Two channels is accepted by convention, or deliberate design, as sufficient. Is there a better method, that listening tests could reveal? I don't think so. As such, listening tests are probably just confirmation of what we already know - and much less sensitive than measuring instruments - and possibly less sensitive than a relaxed person listening for enjoyment (controversial! but we cannot know the answer).
 
Last edited:

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,939
Location
Oslo, Norway
I do believe listening tests could tell us whether three channel would be superior for subjective image perception than two channel, for example. Intuitively, three channel has always seemed to me like a more logical way of reproducing music than two channel. Floyd Toole mentions a couple experiments about this in his book (I only read the 2nd edition), but it seemed to me like the results were inconclusive at the time.
 

Fitzcaraldo215

Major Contributor
Joined
Mar 4, 2016
Messages
1,440
Likes
634
In audio, the current state of the art (as it always has been) is the linear system. Two channels is accepted by convention, or deliberate design, as sufficient. Is there a better method, that listening tests could reveal? I don't think so. As such, listening tests are probably just confirmation of what we already know - and much less sensitive than measuring instruments - and possibly less sensitive than a relaxed person listening for enjoyment (controversial! but we cannot know the answer).

Yes, 2 channel stereo was adopted as a conventional standard in the 1950's, and it remains the widely accepted paradigm. I know of no formal, published scientific listening tests that reveal that multichannel music reproduction from discretely recorded multichannel sources is preferred over stereo by test subjects. As you say, it would just be confirmation of what many already know to be obvious. Therefore, such scientific tests are not published. No one has an interest in rigorously attempting to prove the obvious. Such tests would be superfluous, laughably so.

At the advent of color TV, I also do not think that there were scientific studies to "prove" conclusively that people statistically preferred color to black/white. Not saying multichannel audio sound is quite as obvious as that, but I think you catch my drift. Maybe such formal listener preference studies of stereo vs. mono were published, but I doubt it for the same reasons.

Where rigorous scientific listener preference studies of sound reproduction are useful is in identifying many smaller, less obvious differences that were not previously "known" or which were controversial. And, indeed, many of the issues we now routinely deem as "known" might not be known were it not for such earlier studies on human test subjects.

We often forget or are oblivious to the painstaking research and testing that underlies what we now deem as self evident. If you take the long view, some things in audio were not so obvious at one time, and many were not routinely accepted until adequate scientific testing on human subjects was done.

And, there are many mysteries in the realm of psychoacoustics that measurement instruments cannot measure, but which can have significant impact, including on audio system design. There still are many mysteries to be solved, I believe. And, there have even been some accepted conventions proven by newer research to be inferior in terms of listener preference.

As Toole says, two ears and a brain are not the same as an omni mike. Again, I highly recommend his latest book, Sound Reproduction - 3rd Edition for many useful insights into the research on acoustics and psychoacoustics, his own and others.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
Fair enough to fully read BS1116 - the intent is clear, to me, that distortion is deliberately, knowingly included, at certain levels - perhaps from data compression - until the subjects are able to detect impairment - I just apply this in the greater context, when listening to systems playing back "uncontaminated" material.

No, that is not what BA1116 says. It is an ABC/hr system, where there are two goals, to discover which of B and C are the "hidden reference" (A is the reference) and then describe how much different 'B' is.

Unless you are able to switch systems clicklessly while sitting in the same seat, with prompt switching, etc, you aren't running anything like BS1116.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
Couldn't you just... measure your system?

Measure for what? Frequency response of direct signal? Frequency response of diffuse signal? Degree of diffusion of diffuse signal? Direct to diffuse ratio? That as a function of frequency? Distortion? Room interaction (many measurements there), etc.

Remember, two channel playback is a very, very ROUGH approximation of an original soundfield. Steingburg and Snow proved in 1933 (YES, 1933) that ***THREE*** channels were absolutely necessary for proper rendering of the front soundstage, and that 2 was not sufficient.

So we have an illusion made by a flawed system.

Which illusion do you PREFER? Tell me that? Can you describe that in measurements? Just for starters.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
In audio, the current state of the art (as it always has been) is the linear system.

That because the ear is a kind of spectrum analyzer, and adding new tones to a signal is a very annoying kind of impairment.

Two channels is accepted by convention, or deliberate design, as sufficient.
Not even close. Not even in the dugout, let alone in the actual ballpark. Steinburg and Snow took that apart on 1933. There was an argument about 2 vs. 3 then, with jingoistic advertisers ranting quite maliciously about "you only have two ears", and even then, missing the basic physics of the situation that were already known.

3 channels is rock-bottom MINIMUM for a front soundstage, with no envelopment or depth. Sorry, but that's been firmly established for going on a century now.
Is there a better method, that listening tests could reveal? I don't think so. As such, listening tests are probably just confirmation of what we already know - and much less sensitive than measuring instruments - and possibly less sensitive than a relaxed person listening for enjoyment (controversial! but we cannot know the answer).

So then, you do understand (obviously you do NOT) that different people prefer illusions from their stereo system. Some like direct, some like totally diffuse, some like a mix. That's just one element for starters. I am not going to even describe this in one paragraph.

And for the "two ears" foolishness, our heads move. That alone is a refutation.
 
Status
Not open for further replies.
Top Bottom