I think it's most practical to mix and master in an environment that most accurately reflects the listening environment of your target audience. You're trying to tailor their experience, so you should as closely replicate that experience as you can while monitoring. If you intend for your audio to be played out of a tinny little cell phone speaker (or the kind found in mobile gaming devices) then you should mix on that -- this is what professional game composers do for mobile games and handheld console games.
Frankly, for most modern producers, the actual specific hardware used does not really need to be flat or highly isolated, the user should just enjoy the sound coming out of it -- if it degrades or distorts the quality somehow, it's going to apply the same distortion no matter what you're playing out of it, and 99% of listeners are not going to be listening with studio monitors in a sound-controlled environment. Mixing and mastering is typically done most accurately by using references, and the reference tracks used need only sound pleasant and normal to the person doing the mixing in order to have a decent reference point, with special attention paid to how much clarity is required in the very high and low end of the spectrum to get the emphasized details in the sound for a giving music genre. Some music doesn't care about sub at all, some music is entirely dominated by granular details in the sub, some music is actively hindered by having very clear highs in the 13k-20k. range because of unpleasant sibilants that otherwise sound great on objectively worse speakers.