In general, validating one energy simulation program against another is not meaningful. "Validated against DOE-2.2" implies that DOE-2.2 is the truth standard. Which it is not. At least not any more so than any other energy simulation program. As you mentioned, energy simulation programs are tested—which is not exactly the same as validated—against both analytical solutions and one another via the evergreen ASHRAE Standard 140. As both EnergyPlus and DOE-2.2 have submitted to ASHRAE 140 testing and have presented results that are within acceptable range, they have been "tested against one another" to the extent that testing is performed for all practical purposes.
For individual phenomena/systems/configurations, programs can and should also be validated against field data, although the resolution and fidelity of that data and of the characterization of the experiment must be quite high to support validation. That kind of data is currently not abundant, but for whatever it's worth, DOE is embarking on a multi-year project to conduct validation-grade experiments in well-characterized, highly instrumented facilities like LBNL's FLEXLAB.
In the specific case of VRF, ASHRAE Standard 140 does not have a test and it will take several years to put one together. However, there are several recent field studies and characterizations, including a prominent one by FSEC from which there EnergyPlus model was developed. I didn't think DOE-2.2 had a VRF model, but if it did it could be validated against this field study.
Learning opportunity for the reviewer: https://unmethours.com/question/10447...
Just out of curiosity, did the reviewer ask for the performance curve creation inputs (if you did not already supply them for the submission)?