Cool, thanks for the detailed explanation. It would help a lot in our discussion.
As I said earlier, video is different from audio. Let's focus on the last picture (i.e. the smooth and analog one you mentioned). It is more close to our audio's application.
In the smoothing process, the DAC is doing it for you.
With external upsampling, your upsampling software is helping the DAC to do the smoothing.
Agree?
If you agree, then we are actually comparing DAC's smoothing algo (let's call it Algo A), and upsampling software's smoothing algo (let's called it Algo B).
If we just want a smooth audio signal, Algo A can give you smooth audio signal output. Done.
However, if we want the best smoothing algo, Algo B may give you better options. Agree?
Both Algo A and Algo B will give you "fake" values to connect the dots "in your picture". It just happens that Algo B would have more resouces (full computer resources) vs a tiny DAC chip in doing the job.
One more thing, you believe 44.1k is 8k? There is no fixed answer here. To me, IMHO, it is just 720p so I enjoy doing 4k or even 8k upsampling.
Video and audio is quite similar imo, and if we're talking in the digital domain they're very similar actually except that audio is a quite a bit simpler. The easiest way of storing audio is PCM, with the time resolution as samplerate and an amplitude in bits on each sample. An image is the same, but instead of time resolution it got spaital resolution, but in two dimensions instead of one, X and Y, and instead of just one amplitude in bits it got three, R, G and B in bits (and sometimes alpha as well). And if it's a video it got a time dimension as well in framerate.
And all of these is susceptible to the same errors as audio. In audio we have antialiasing filter to filter everything above the Nyquist frequency to not get ugly audible aliasing below it (quite rarely heard nowadays though), aliasing in images is quite a known one that if you're a gamer I guess you've seen creeping pixel aliasing on thin lines etc, or in the old days you could often see it when filming someone with a fabric shirt (also called moiré), and antialiasing in image is essential blur. Aliasing also happen in video as well, the wagon wheel effect, when you can see a moving car where the wheels sometimes look like they're standing still or even rolling backwards, and there the antialiasing is motion blur, ie longer shutter speeds.
Quantizing errors as we get in audio when we don't use dither is audible in reverb tails where the audio kind of flicks on and off, while in images we see it as posterizing or banding, mostly visible in skies. The cure for that is the exact same as in audio with dither/noise.
Me having worked with 3D graphics for a decade or two and been into photography for even longer I do see audio and image quite the same, even though they of course are experienced differently.
And with that experience (and quite good eyesight) I'd say that for almost all usages 8K is around the upper limit for human eyes (4K is still VERY good though), so kind of equivalent to 44.1khz in audio.
Anyways, slightly offtopic I guess, but still relevant since I like to compare it that way since images is less abstract than trying to explain audio with words
But yeah IF an offline upsampler could do a better job than an upsampling DAC than it would of course be a benefit of doing it that way, but I've never seen any proof of that. And even a regular 44.1khz DAC with no upsampling can play 44.1khz audio just fine.