But I think you are asking if we can conceive the sound of the song (I think you are talking about recorded sound) as a thing-in-itself or whether it is always a product of the reproduction mechanisms, environmental acoustics and psychoacoustical contexts of the listeners.
Isn't the song... the sound? What does it mean "sound of the song"?
Take an instrumental (maybe synthetised) recorded track, and listen to it.
What is if not an organized sound? Melody, rhythm, pitch, volume, timbre... that's what you perceive and enjoy. All of them are components of sounds, nothing other.
So I would start asking (as I did) "what's a song if not an organized sound"?
Than we could argue about its deviations/reproductions mechanisms and such, I believe.
Or simly skip the ontology part (for a moment), and go directly to your dichotomy about song, from a single-individual/listener point of view:
-
a thing-in-itself: if that's true, this imply that you listener extrapolate a "common" audio object from different/similar sources, that's what you mean? If that's true, any high fidelty system will be enough to everyone so.
Instead, there are a world behind this, with people struggling on lots of gears and quality setups, due to... what? Any fidely reproduction/environment system will be enough for our brain, to catch and decode "the thing-in-itself". Because any fidelty setup is already advanced and pretty flat/linear.
It seems instead they choose such that particular system because they love the properties it "adds" to the listening; which is against the idea of "the thing-in-itself" (i.e. I can't have 10 setups that decode 10 thing-in-itself, otherwise I fall to the other point of your dichotomy). I would call it "searching from a invidual objectivity".
-
a product of the reproduction mechanisms, environmental acoustics and psychoacoustical contexts of the listeners: if that's true, I (as listener) have the mastery to take a record and shape it as I want, editing some properties of the sound (thus the song? previous question): dynamics, timbre, tonal balance, reverb/reflections, and such. Which makes me think that maybe there's somethings wrong.
There isn't a convergence, and most aspects become aleatory (still from a single point of view; I'm not arguing the differences between people).
Both the points got some inconsistency in my opinion.
Do I feel confuse? Yes I am, that's why I'm writing topic like this
If you say a synthesiser (or a volume control) can produce "an infinite number of sounds" the superficial suggestion seems to be that it can produce all sounds. But of course that's not true even though it can produce an infinite number of sounds...
This doesn't help my doubt. Even if you have resized the "infinity" part, you are still dealing with a very huge amount of cases (with some unpredictable parts).
And that's introduce the very same question: if small deviations doesn't impact the result, why there are lots of setups/techniques that manipulate (and they differs) exactly and only for those "small" parts?