Understanding Upsampling/Interpolation

j_j · Nov 17, 2017

If you look at www.aes.org/sections/pnw in the "meeting recaps" section, there is a tutorial on sampling rate conversion.

That's not in the list of powerpoints that Amir just pointed to, sorry.

TabCam · Jan 19, 2021

If you look at this Deep convolutional neural networks using perceptual loss, you can see the progress that has been made. Problem will probably be that machine learnings requires adequate examples and maybe it differs a lot on type pf music, synthetic, generated, acoustic etc.

j_j · Jan 19, 2021

TabCam said:
If you look at this Deep convolutional neural networks using perceptual loss, you can see the progress that has been made. Problem will probably be that machine learnings requires adequate examples and maybe it differs a lot on type pf music, synthetic, generated, acoustic etc.

What do you actually expect to change vs. accurate spectral and time domain replication? SRC is not a perceptual process, it is defined precisely in mathematics, and it's not even terrible expensive in the modern world. If an SRC is broken enough to have perceptual loss, it's broken. EOF.

This is nothing like image process at all. NOTHING.

TabCam · Jan 20, 2021

j_j said:
What do you actually expect to change vs. accurate spectral and time domain replication? SRC is not a perceptual process, it is defined precisely in mathematics, and it's not even terrible expensive in the modern world. If an SRC is broken enough to have perceptual loss, it's broken. EOF.

This is nothing like image process at all. NOTHING.

That is the point of the whole topic, comparing upsampling from image processing as an argument that lost detail cannot be recreated. The latest techniques come close to the original. If we would do the same for music, train Neural Networks with reduced data and checking how close they come to the original, would we not get a much better result? Quite likely not real time but maybe as a preprocess stage?

j_j · Jan 20, 2021

TabCam said:
That is the point of the whole topic, comparing upsampling from image processing as an argument that lost detail cannot be recreated. The latest techniques come close to the original. If we would do the same for music, train Neural Networks with reduced data and checking how close they come to the original, would we not get a much better result? Quite likely not real time but maybe as a preprocess stage?

No, there is no image processing issue involved in the ACTUAL process of audio upsampling. No. Imaging works in the spatial domain. Sound works in a peculiar time/frequency domain dictated by how the human cochlea actually is known (tested, verified, and understood) to function. Yes, you have to worry about images and aliases in the signal, that's a completely different thing, which you can see discussed below in some detail.

Image detail requires "making up" information based on SPATIAL cues that must be eliminated (pixelation) and other cues that should be carried through (image edges). The information is spatial in character, and conversion to the frequency domain may be useful as a processing step, but is not the key to understanding the perception of the imagine.

Audio is frequency based. There is one, I repeat ONE issue. Do not muck up the spectrum. If you double the sampling rate, do not add anything, do not take anything away, because there is no detectable feature to be "inserted", barring some very young ears listening. Even if there was, the structure of audio signal creation makes "guessing right" much more difficult.

As a result, the examination of actual audio upsampling is one of the very few things in the audio domain for which least-mean-squares error is actually important. What do you even MEAN "compared to the original"? What original do you have in mind?

With PCM you ***GET*** all of the original in the bandwidth you started with. Your idea does not even fit into the reality of the process.

If you mean fixing the output of a perceptual codec (the equivalent of reducing pixel count in an image, to some poorly equatable extent) then you're arguing about something that has exactly ZERO to do with upsampling, or downsampling.

So, look, don't condescend to me here, by telling me what upsampling means. Yes, I know about both image and audio, and the two problems are simply not the same in any reasonable regard.

Ditto downsampling, by the way.

Here, this is how sampling rate conversion works, and why least mean squares matters for audio. I haven't done the same for imaging, because it's MUCH more in infancy (the work you show is reasonable for images), partially because of the rather substantially different perceptual constraints, and I prefer to work on audio.

So read this for audio sampling rate conversion, covering both upsampling AND downsampling.

https://www.aes-media.org/sections/pnw/pnwrecaps/2016/jjsrc_jan2016/

Scroll up at http://www.aes-media.org/sections/pnw/pnwrecaps/index.htm if you need some updating on how the ear works.

pkane · Jan 20, 2021

TabCam said:
That is the point of the whole topic, comparing upsampling from image processing as an argument that lost detail cannot be recreated. The latest techniques come close to the original. If we would do the same for music, train Neural Networks with reduced data and checking how close they come to the original, would we not get a much better result? Quite likely not real time but maybe as a preprocess stage?

Audio resampling is not a perceptual process, there's no detail lost if done properly. Up to about 1/2 of the sampling frequency there's no missing information that needs to be filled-in.

If you're talking about recovering missing frequencies above 1/2Fs then, sure, try the convolutional neural nets or any other interpolation/extrapolation you want. But for normal, every-day audio resampling there's no reason to guess, interpolate, infer, or extrapolate unless your sampling frequency is so low that it can't fully represent the audible frequency range. And by the way, this is the same with image resampling.

UliBru · Jan 20, 2021

I like to share some basics for a better understanding of upsampling:

Let's start with a logsweep signal 10 Hz to 48 kHz @ samplerate 96 kHz with length of 10 seconds. The frequency response looks like

The red curve shows the 96 kHz sweep downsampled to 48 kHz = green curve. It should be clear that the downsampled signal cannot contain the frequencies from 24 kHz to 48 kHz (marked area).
The time signal for the two signals looks like this

It becomes clear that now the high frequency content is simply nulled. By this example the HF content is positioned at the right side. From logic it does not matter if it is separated or mixed with the signal.

The logic also tells us that there is no way to reconstruct the red part from the green part as we simply do not know if e.g. the original sweep has ended at 48 kHz or already at 40 kHz.

Now we try upsampling by filling in zeros between each green sample (zerostuffing). This looks like

Zooming into the detail reveals a bit more

Now lets look at the frequency response of the brown zerostuffed signal

The chart here displays the frequency axis in linear view. We can see that the right side above 24 kHz is a mirror view of the left side below 24 kHz = aliasing. The right side is not allowed to exist as the original signal = green curve has no information about any frequency content above 24 kHz

So obviously we have to take away the frequencies above 24 kHz by a brickwall filter

The ideal brickwall filter is a sinc filter of infinite length. In practice shorter filters are applied and there is much discussion about required legth, windowing, linear phase or minimum phase. Anyway the convolution of the brown zerostuffed signal with the sinc filter results in

The cyan curve is the 96 kHz upsampling of the 48 kHz downsampling of the original 96 kHz logsweep.
It clearly ends at 24 kHz, there is no information in the downsampled signal which lets reconstruct the upper right part of the red curve.
For frequencies below 24 kHz the convolution of the zerostuffed signal with the sinc filter results in a nearly perfect reconstruction of the time domain signal, here in comparison between zerostuffing and interpolation

And finally a comparison between original and upsampled curve part example

I hope it becomes understandable that there is no logic (even no deep convolutional neural network) which allows to reconstruct the frequency content a signal which is lost forever.

j_j · Jan 20, 2021

pkane said:
And by the way, this is the same with image resampling.

Image resampling is rather different, in perceptual terms, because the lines caused by pixellation are extremely visible.

pkane · Jan 20, 2021

j_j said:
Image resampling is rather different, in perceptual terms, because the lines caused by pixellation are extremely visible.

Sure, perception is very different. The math governing resampling is the same, except for the extra dimension.

j_j · Jan 20, 2021

pkane said:
Sure, perception is very different. The math governing resampling is the same, except for the extra dimension.

No, the math is quite different, because you do not preserve spatial frequency content in images, it's a near-meaningless idea (beyond MTF at least), as grating sensation tests show. What you must do is preserve edges, with some control over frequency noise. Sorry to insist, but basic accurate (in frequency) interpolation looks pretty crappy indeed.

ElNino · Jan 20, 2021

j_j said:
No, the math is quite different

pkane is correct... the math is the same.

Some of your comments in post #25 above suggest that you would benefit from taking a course on the mathematics of signal processing (convolution, etc.).

pkane · Jan 20, 2021

j_j said:
No, the math is quite different, because you do not preserve spatial frequency content in images, it's a near-meaningless idea (beyond MTF at least), as grating sensation tests show. What you must do is preserve edges, with some control over frequency noise. Sorry to insist, but basic accurate (in frequency) interpolation looks pretty crappy indeed.

Maybe true for “pretty” pictures where lossy, perceptually-weighted algorithms are acceptable. Not in most scientific image processing where data preservation is a must. Try to extract a proper star profile from an edge-enhanced, resampled image. Or use a deconvolution algorithm on it, or measure flux fall off due an exoplanet transiting a star. The results will be catastrophic.

j_j · Jan 20, 2021

ElNino said:
pkane is correct... the math is the same.

Some of your comments in post #25 above suggest that you would benefit from taking a course on the mathematics of signal processing (convolution, etc.).

I'm sorry, I do signal processing for a living, and I have a couple of IEEE awards to show for it.

The math is different BECAUSE THE PERCEPTION IS DIFFERENT. Live with it. Nonlinearities make sense with image interpolation.
They are tragically disastrous for audio interpolation.

These are testable, verifiable facts, and your vile professional insult shall be retracted promptly.

I would suggest that rather than make false professional attacks, you wander back up thread a couple of steps, and read a couple of the tutorials I cited. You might dig up a few of my papers as well, and find the examples of use of both convolution and deconvolution (both fast and numerical), as well as filter design, perceptual analysis of both audio and video, and consider that you may be way off course here.

Furthermore, your intentionally vague professional accusations about "#25" I note are specifically avoiding being specific, so as to further your false professional attack.

j_j · Jan 20, 2021

pkane said:
Maybe true for “pretty” pictures where lossy, perceptually-weighted algorithms are acceptable. Not in most scientific image processing where data preservation is a must. Try to extract a proper star profile from an edge-enhanced, resampled image. Or use a deconvolution algorithm on it, or measure flux fall off due an exoplanet transiting a star. The results will be catastrophic.

You're talking about a completely different issue here, that of accurate to Least-mean-squares interpolation. Except for the debate on separable vs. nonseperable filtering, something like deconvolving a telescope image is reasonably similar, BUT image interpolation as discussed by the fellow above is talking about dealing with perceptual issues, not LMS. For images to be VIEWED as opposed to analyzed, preservation of lines and edges, and avoiding any "blockiness" are the key.

ElNino · Jan 20, 2021

j_j said:
I'm sorry, I do signal processing for a living, and I have a couple of IEEE awards to show for it.

The math is different BECAUSE THE PERCEPTION IS DIFFERENT. Live with it. Nonlinearities make sense with image interpolation.
They are tragically disastrous for audio interpolation.

These are testable, verifiable facts, and your vile professional insult shall be retracted promptly.

I would suggest that rather than make false professional attacks, you wander back up thread a couple of steps, and read a couple of the tutorials I cited. You might dig up a few of my papers as well, and find the examples of use of both convolution and deconvolution (both fast and numerical), as well as filter design, perceptual analysis of both audio and video, and consider that you may be way off course here.

Furthermore, your intentionally vague professional accusations about "#25" I note are specifically avoiding being specific, so as to further your false professional attack.

Sorry, I'm not understanding your response here or why it's so emotionally charged. I didn't say anything that could be construed as a professional attack -- I honestly have no idea who you are, and I don't know anything about your background, but I have studied signal processing at the graduate level, and it simply isn't true that audio and 2D image signal processing are fundamentally different.

j_j · Jan 20, 2021

ElNino said:
Sorry, I'm not understanding your response here or why it's so emotionally charged. I didn't say anything that could be construed as a professional attack -- I honestly have no idea who you are, and I don't know anything about your background, but I have studied signal processing at the graduate level, and it simply isn't true that audio and 2D image signal processing are fundamentally different.

I do signal processing for a living, I have written papers on various things using, involving, speeding up, etc, convolution and the like, and you turn around and tell me that I am ignorant of a basic subject on which I have written rather extensively, both to theory (less so, I use the processes) and practice (which I have written a lot about).

You choose to make this very serious professional accusation without actually stating any specifics, in a vague, offhand fashion, and then you wonder why I'm offended.

THEN you play the old "emotionally charged" card after you uttered truly horrible professional disparagement.

AND THEN YOU CHANGE THE SUBJECT, or try to. This is not all of image processing, the subject here is UPSAMPLING of images, for viewing. NOT for instrumentation, NOT for sharpening, but for viewing.

Given I have posted a couple of papers on sub-band image coding, you'd think I know something or other about actual use of filters in the real world, yes? Yeah, I do. The subject is upsampling for viewing. It is not all of image processing. So why try to move the goalposts now?

Why? Because you're trying to play "king of the hill".

I suggest you learn something about both audio and video perception, and then maybe you'll see why your entire set of posts are simply confusing the entire issue.

FINALLY you accused me of not understanding basic FIR filters, so please, show, exactly, where I exhibited that? You said it, now either produce or do not.

RayDunzl · Jan 20, 2021

ElNino said:
I honestly have no idea who you are

https://www.aes.org/member/profile.cfm?ID=1800973364

j_j · Jan 20, 2021

RayDunzl said:
https://www.aes.org/member/profile.cfm?ID=1800973364

And it shouldn't matter who I am, he should be careful with really nasty accusations and learn not to defend mistakes. Besides, I DID send him a link to a c.v. including the IEEE awards in signal processing. So, he DOES know.

ElNino · Jan 20, 2021

j_j said:
And it shouldn't matter who I am, he should be careful with really nasty accusations and learn not to defend mistakes. Besides, I DID send him a link to a c.v. including the IEEE awards in signal processing. So, he DOES know.

You only emailed me your CV at 4:34pm EST, after I had posted.

Sorry, I’m not interested in engaging with the level of vitriol you’re expressing here.

j_j · Jan 20, 2021

ElNino said:
You only emailed me your CV at 4:34pm EST, after I had posted.

Sorry, I’m not interested in engaging with the level of vitriol you’re expressing here.

Really? Color me skeptical. As to "vitriol" the vitriol is 100.00% yours. You picked a fight, and refused to admit you stepped into a pile thereof. In the future, perhaps you should concentrate on the technology rather than on winning an argument.

The dishonest argumentation methods you've used here are rather obvious. You made the "emotional" accusation, then you accuse me of "vitriol", after claiming I don't even know my field.

If in fact you got the CV after you said that, I apologize. It's easy to miss one comment in the midst of all of your defamatory nonsense.

Understanding Upsampling/Interpolation

Major Contributor

Active Member

Major Contributor

Active Member

Major Contributor

Master Contributor

Active Member

Major Contributor

Master Contributor

Major Contributor

Addicted to Fun and Learning

Master Contributor

Major Contributor

Major Contributor

Addicted to Fun and Learning

Major Contributor

Grand Contributor

Major Contributor

Addicted to Fun and Learning

Major Contributor

Similar threads