MQA creator Bob Stuart answers questions.

Sal1950 · Jun 10, 2019

JohnPM said:
Conspiracy theorists of the world unite, someone out there is out to get you.

Just because I'm paranoid doesn't mean no one is following me.

somebodyelse · Jun 10, 2019

JohnPM said:
The last issue on high resolution audio was in 2004, May's was an update.

May 2014 - including papers on 1-bit audio and MLP among others. IIRC there were discussions going on around that time on the audibility of watermarking that most publishers were including on high resolution formats. Bob Stuart agreed with someone's argument that watermarks had to be audible in order to meet their goal of remaining identifiable when used with a notionally perfect lossy codec as anything inaudible would be discarded. I wonder if the original forum discussion still exists.

Blumlein 88 · Jun 10, 2019

Sergei said:
Perfect!

So, my point was, that expectations based on the theory of Linear Time Invariant (LTI) systems, which are traditionally analyzed with the help of Fourier transform, are breaking down for Mammal Hearing System (MHS), which is neither linear nor time invariant.

In LTI, we care about durations, frequencies, sampling rates, and amplitudes in the time and frequency domains. In MHS, we also have to care about onset times, recuperation periods, levels of perceived loudness, inter-frequencies masking etc. "Four sounds a little louder than One" is not what LTI predicts, yet it makes perfect sense in the MHS framework.

The experiment illustrates at least two things:

(1) In MHS, perceived loudness depends not only on amplitude, but also on duration. This is a robust effect, linked to the hearing system's "slow" integrator, operating over tens of milliseconds. There also exists a less robust effect, not demonstrated by this experiment, due to "fast" integrator, operating over tens of microseconds, which makes a perceived onset time depend on the amplitude.

(2) Some of you will be able to differentiate between One and Two, some not. Or between Two and Three. Virtually everyone will be able to differentiate between One and Four. And this is for the "slow" integrator, considered rather consistent! Individual differences in functioning of the fast integrator are more difficult to elicit experimentally, yet they do exist.

Qualitatively, the number of dimension LTI operates in is smaller than the number of MHS dimensions. If we hold constant the value(s) in one or more of MHS dimension(s), we take the dimension(s) out of play, and then MHS behavior follows the LTI-predicted behavior more closely.

That's the general reason why "simple" music, mostly consisting of a small number of sinusoids slowly changing their amplitudes and frequencies over time, is more readily amenable to LTI analysis. The effects of the perceptual integrators fade away. Onset times matter less.

The "complex" music, with large number of sinusoids exhibiting fast and frequent onsets and fadeouts, chirps, and transients, is not as amenable to LTI analysis. The integrators play an important role in this case. We better preserve the information about the onset times more accurately.

Lots of wheel spinning here. While what say is true enough, but it never gets anywhere. None of this indicates timing is inadequate the way things are done. The timing is like 100 times better than needed, imagine the improvement if it were 10,000 times better. Well I'm not hearing it.

amirm · Jun 10, 2019

somebodyelse said:
See @miero thread on signal generation with sox which starts with synthesis of 1kHz tone files. The sox website has downloads for Windows and MacOS so you don't need to be using linux. I think these should produce the signals asked for, although you may want to change the sample rate and depth, and the attenuation from full scale:

Thanks. But that misses the last part of my sentence. I like to see @Sergei run his listening tests and post his observation and files. Then we can get somewhere as opposed to a theoretical discussion, or dismissal of the results after the fact because the test files were not this way or that way.

Sergei · Jun 10, 2019

SIY said:
Start with the irrelevant, end up with the repeated, ummm, misunderstanding. The perfect circle.

What is your explanation of the four-tone experiment?

I did read some of your writings available on the Web. Watched some of your video tutorials. About every twentieth sentence you wrote or uttered there was devoted to disparaging someone, including a professor who taught you the basics of audio science. Not enough information to construct a robust psychological profile, yet enough information for me to not take your insults personally.

However, we are treading across a serious terrain now. LTI and MHS do give different predictions of the severity of impact on the human hearing system caused by certain complex sounds containing significant number of sharp transients. A health issue. A liability issue. "Could be the asbestos of the 21st century" issue. About as appropriate subject for snide unsubstantiated remarks as the Holocaust IMHO.

If you deny the contemporary MHS approach, what is the alternative you propose?

Costia · Jun 10, 2019

JohnPM said:
Not sure if it has already come up, but there was an interesting paper on "Modern Sampling" in May's AES journal. It is an open access paper so free to download. Modern Sampling: A Tutorial.

So it basically says you can use shorter reconstruction filters to get a result that is presumably at least equivalent to using sinc.
(Edit: that's actually nice since it can make better palyback HW cheaper)
But the total freq. response of pre-filter+reconstruction filter looks like a low pass to me. So it should have ringing artifacts as well.
They show that the reconstruction filter alone won't cause ringing, but what about the whole system?
Whats missing for me in this artice is a comparison between analog->sinc->sinc->out vs analog->pre-filter->beta-filter->out for various input signals, such as a square wave.

SIY · Jun 10, 2019

Sergei said:
What is your explanation of the four-tone experiment?

It's irrelevant to this issue.

I am unaware that I was taught audio science by a professor, much less that I disparaged this non-existent person.

Blumlein 88 · Jun 10, 2019

Let me see, regular filtering is bad, because it is imperfect which causes some level of aliasing. B-splines are also imperfect, and have very slow roll offs above FS, but that is okay because there is usually low energy at high frequencies for audio. Which would also mean the amount of aliasing (related to the strength of the signal) is low in the more normal conversion. Did we get anywhere with this? Oh and there are problems implementing these in practice so additional filters to flatten response will be needed. Oh, oh, while we are at it we should mention sampling rates might need to be 3 or 4 times higher or maybe since audio is somewhat self bandlimited just twice as much will do. To equal the normal Shannon rates mind you.

Now if this type of filtering has an advantage I don't see why someone can't produced ADC's and DAC's to use it. MQA was an attempt to do some of this and lock it in like Dolby has on video. It would be as if the first delta-sigma converters were patented and added to some encoding scheme so you had to have special licensing to use them. Say we call it SSD for Super Sampling Digital. MQA is trying to do something similar to get paid for this by including all the hidden stuff, lossy compression etc to promote authentication.

I would like to have seen a comparison of the error values for the normal method and modern methods. Again if its a better way it can be used. Wouldn't be too hard to have different filters which switch in and out depending upon whether it was traditional PCM or B-spline based reconstruction. But for this complication what is actually gained? Considering the error levels of 96/24 I'd think there is little to gain.

Costia · Jun 10, 2019

Sergei said:
There also exists a less robust effect, not demonstrated by this experiment, due to "fast" integrator, operating over tens of microseconds, which makes a perceived onset time depend on the amplitude.

Can you link to a paper about the fast integrator?

MRC01 · Jun 10, 2019

Getting back to the theoretical basis for a moment: it's mathematically proven that the Whittaker-Shannon reconstruction formula perfectly reconstructs the analog wave that was digitally sampled, so long as that original analog wave was bandwidth limited below Nyquist.

So what is the argument here? That we're not actually using the Whittaker-Shannon formula to reconstruct the analog wave? It requires too much computation and read-ahead to be practical so instead, we're using techniques like delta-sigma. Thus they fall short of perfection? If that is the argument, then the degree to which they fall short can be quantified.

PS: or is the argument on the encoding side: that applying the AA bandwidth filter during encoding, distorts the signal in some way?

somebodyelse · Jun 10, 2019

amirm said:
Thanks. But that misses the last part of my sentence. I like to see @Sergei run his listening tests and post his observation and files. Then we can get somewhere as opposed to a theoretical discussion, or dismissal of the results after the fact because the test files were not this way or that way.

Fair point. With my devil's advicate hat on I'd argue telling you what you're supposed to hear before you hear it would affect what you hear. Is there a way to do a sealed post with the "here's what you should have heard and why" part that can only be opened some time later, or do we just have to trust people not to open spoiler tags in this sort of situation?

Having said that I'm missing how this specific test is relevant. It's an interesting demonstration of a phenomenon I didn't know about, but it's something that can be captured and reproduced by the existing recording/playback chain.

Costia · Jun 10, 2019

MRC01 said:
So what is the argument here?

That we can do better.
Here's an example: "1sec square wave at 0db"
That took 23 bytes and contains more data than any wav file ever can at any sample rate, since a square wave has unlimited BW.

MRC01 · Jun 10, 2019

Costia said:
... Here's an example: "1sec square wave at 0db"
That took 23 bytes and contains more data than any wav file ever can at any sample rate, since a square wave has unlimited BW.

A perfect square wave is a mathematical construct that doesn't exist in nature, let alone music. Every square-wave-like sound that actually exists, is bandwidth limited. And our perception is also bandwidth limited.

Costia said:
... we can do better. ...

I agree. 44-16 isn't quite fully transparent to all humans. But evidence suggests it doesn't take much more for the digital encoding & reconstruction to be fully transparent. I am all in favor of a higher standard, say 64-24 or whatever it would take to be fully transparent with a reasonable safety margin.

However, once digital encoding & reconstruction is fully transparent, we're not nearly done. Other aspects of the recording process are even less transparent than 44-16, for example the limitations of microphones, placement, room effects, among other things.

Costia · Jun 10, 2019

MRC01 said:
A perfect square wave is a mathematical construct that doesn't exist in nature, let alone music. Every square-wave-like sound that actually exists, is bandwidth limited. And our perception is also bandwidth limited.

Electronic music can contain perfect square waves since its synthesized

nscrivener · Jun 10, 2019

Costia said:
Electronic music can contain perfect square waves since its synthesized

No it can't, because you can't encode for two different voltages at the same time point.

MRC01 · Jun 10, 2019

Costia said:
Electronic music can contain perfect square waves since its synthesized

Actually, it can't because electronics with infinite bandwidth don't exist. Also, an actual sound is made from changing air pressure. And the mathematical derivative of a perfect square wave is undefined at its transition point. That means a perfect square wave, to propagate as sound in the air, would require an infinite rate of change in air pressure, which is not physically possible.

Electronic music can construct square-like waves using wider bandwidth than we can hear. Call that SSBSWLS (supersonic bandwidth square-wave like sounds). But we can't hear the difference between a SSBSWLS and a square-like wave constructed from bandwidth we can hear.

Put differently: construct square wave (A) using 1 MHz bandwidth. Construct square wave (B) using 25 kHz bandwidth. All else equal: frequency, amplitude, phase. We humans can't hear the difference between A and B. At least, I've never seen evidence suggesting this.

Costia · Jun 10, 2019

nscrivener said:
No it can't, because you can't encode for two different voltages at the same time point.

Yes you can. Not in wav though.
Problem would be reproducing it as analog, which you cant.

MRC01 said:
Actually, it can't because electronics with infinite bandwidth don't exist.

You can synthesize whatever you want, including a signal with infinite BW, just dont store it as a wav.

Edit:
https://github.com/cristoper/wav2vec

nscrivener · Jun 10, 2019

nscrivener said:
No it can't, because you can't encode for two different voltages at the same time point.

The reason for this is that a square wave moves from one amplitude to another in no time at all. It is an infinitely short period of time between state A and state B. Because sampling works by giving you one sample per time interval, the best you can do is approximate it. It doesn't happen in nature either for obvious reasons. (And no, we are not dealing with quantum effects here haha)

MRC01 · Jun 10, 2019

Every sound that actually propagates in air (or water or any other medium) and we can hear, is bandwidth limited. And these bandwidth limited waves can be digitally encoded and reconstructed with mathematical perfection, so long as we sample them at more than twice the highest frequency we want to capture.

Digital audio isn't perfect, but its limitations are not theoretical. They are about the bit rates used, and the algorithms used. We're not using quite high enough bit rate to be fully transparent to all humans, and we're not using the mathematically perfect reconstruction algorithm. However, using higher bit rates and depths can account for both of these limitations.

Costia · Jun 10, 2019

I don't think there is a point in actually doing it for practical reasons.
Point was, we can do better than shanon/nyquist.
It could be an interesting academic paper I guess.

MQA creator Bob Stuart answers questions.

Grand Contributor

Major Contributor

Grand Contributor

Founder/Admin

Senior Member

Member

Grand Contributor

Grand Contributor

Member

Major Contributor

Major Contributor

Member

Major Contributor

Member

Member

Major Contributor

Member

Member

Major Contributor

Member

Similar threads