Compare differing outputs
When a difference is found among the compilers'/interpreters' outputs, I'd like to just see which ones differ.
A solution to this would be to do simple byte comparison against the outputs and group each output into a bucket. Label the buckets (e.g., a
, b
, c
, or else perhaps 1
, 2
, 3
, etc) and log output to the effect of:
[!!!!] Test output does not match: seed: 251366794 2021-02-22T13:13:31.512709
# This is a RANDOMLY GENERATED PROGRAM.
# Fuzzer: python3
# Version: python3 2.0.3 (acb9b15), xsmith 2.0.3 (acb9b15), in Racket 7.6 (vm-type racket)
# Options: --max-depth 5 --timeout 180 --seed 251366794
# Test outputs:
# python3.9: a
# python3.8: a
# python2.7: b
# pypy3.9: a
This will make it easier to quickly get a sense of where the difference lies, how many differences there are, and where to focus further investigative efforts.