Handwaving follows.
The Nyquist criteria says you must sample >2x the highest signal frequency (the Nyquist frequency) to be able to reconstruct the signal. Oversampling is sampling at more than that, typically at least a factor of two or more. For example, if we assume the highest signal frequency is 20 kHz, then the CD sampling rate of 44.1 kS/s meets the Nyquist criteria and allows capture of signal up to (but not including) 22.05 kHz. 88.2 kS/s is oversampled by a factor of two, and so forth.
Oversampling provides margin for the filters needed to band-limit the signal and you can improve the signal-to-noise ratio (SNR). By doubling (or more) the sampling rate, quantization noise (the noise generated when you convert from analog to digital samples) is spread over a larger frequency range. The noise is determined by the number of conversion bits, so if you keep the number of bits and the frequency bandwidth the same, you gain 3 dB in SNR by filtering out half the noise (that is, the noise above Nyquist, say above 20 kHz).
Delta-sigma and other data converters take advantage of oversampling by using high oversampling ratios, noise shaping that "pushes" the conversion noise past (higher than) the signal band, and then using high-order filters to reduce the noise to achieve much higher in-band SNR.
Upsampling takes data sampled at one rate and samples it (the same data) again (resamples) at a higher rate. You can theoretically gain SNR as in oversampling, but you must somehow "fill-in" or generate new signal samples between the actual samples. If the samples you have are 1 and 3, then if you upsample by two an interpolation algorithm can generate a new intermediate sample of 2. The catch is the algorithm cannot know exactly what the original signal was like before it was sampled, so the prediction (interpolated sample) may be wrong. How to design an optimal interpolation filter is the topic of many classes, texts, and proprietary algorithms.
HTH - Don