E.g. if 3 impulses starts would be in 0 ms, phases would conflict near crossover point, FR would be with peaks and dips because of incorrect summation, and the triangle of step response wouldn't be a triangle shape, it will be corrupted even though it is above zero. You need some time delay for 2nd and 3rd drivers to match the phase shift at the crossover point because of HPF and LPF at drivers.
For example, crossover region at ~3ms, tweeter starts in positive polarity, but has a phase shift near crossover because of HPF, for good summation woofer needs the same phase shift (e.g. LR4 filters on both drivers for symmetry) and some delay:
View attachment 263279
Drivers aren't "flipped out of phase". They are in positive polarity, phase and time aligned at the crossover regions but don't have a constant (linear) phase as a system, thus the step response looks like it looks, not an ideal triangle but up and down deviations.
In general, this is an excellent mixing of drivers in a not-ideal world.