Some commentary on headphone measurement, since this is an area of immense personal significance to me:
Headphones are (properly) measured on anthropomorphic fixtures - either Head And Torso Simulators (HATS) per IEC60318-7 and ITU-T P58, or cheaper "ear and cheek" or "hearing protection test fixtures" which feature an anthropomorphic ear on a flat mounting plate such as
GRAS' 43AG or
45CA. In either case, an anthropomorphic human pinna based on a population average (examples and requirements are given in ITU-T P57, IEC60318-7, and IEC60268-7, although to my knowledge only GRAS produces a pinna from the last) with a short "canal extension" tube leading from its ear canal entrance is mated with an IEC60318-4/IEC711 "ear simulator", which emulates the impedance of the human ear at the drum. These systems are used by most professionals measuring headphones.
Some have suggested in this thread that it is necessary to know the HRTF of a specific user to characterize headphone response. There may certainly be some truth here - the results of Smyth's excellent Realizer certainly could indicate in this direction - but I feel that it appears to be taken too far as currently interpreted. When the HRTFs of HATS systems have been measured, they fell relatively close to population averages (as they were intended to), and measurements done using HATS systems and analogous ear simulators have produced robust predictions of subjective frequency response and preference - as an example, the work of Sean Olive, which was done primarily on a GRAS 43AG and 45CA,
fairly reliably predicts user preference for headphones based on frequency response.
View attachment 28415
Consequently I feel it's quite hard to argue that the measurement of headphones is a truly uncertain space. There are certainly nuances - variations in both placement and individual anatomy may influence results to some degree - and I would say that it is a less filled in space than the world of speakers, but it's an area where we have fairly strong tools and a reasonable understanding of how to use them. It's a developing area, without question, and one that was long neglected in favour of speakers, but we have a reasonable body of literature to draw upon at the moment (
Olive's headphone paper collection bundles many of the major ones), and can come to fairly robust conclusions.
@SIY you mention that changing systems significantly changed the response of the headphones you were measuring - may I ask what systems you were comparing? While in principle I would expect some variation based on the parameters of the specific pinnae used - something on the order of
the differences between the major brands' HATS perhaps, which I would expect to be largely (although perhaps not completely) accounted for by using a DF-HRTF appropriate to the system in question as compensation - but I wouldn't expect it to be very large. Although if you're mostly measuring speakers and I'm mostly measuring headphones, perhaps we have different definitions of large frequency response variation