1. How much fuzzing is allowed?
I currently get these results (all tests passed) on my implementation (and pushed a working version with 50 fuzzes total):
# of fuzzes
Is this too slow?
What timeout can we expect on the server? If you can’t tell us what the timeout is, then…
2. … how do we maximize our chances?
Can I tell my test suite to stop in “an orderly fashion” just as I get the timeout? (This is Java, so it probably means catching something called TimeOutException, I didn’t google it yet)
This way I could get the maximum amount of tests while also passing daily tests (because unhandled timeout means a failed test).
The project description says not to stress the servers too much but on the other hand the fuzzer is the way to go for this project and it is heuristic in nature (more = better (with diminishing returns)), so I am in conflict.
I won’t be able to completely answer your questions, but maybe I can help a bit:
The runtime question is difficult, because on different machines you will have different execution times. The timeout induced by the secret tests is usually not made public. That being said, there might be creative ways to get an estimate of what the time out on daily tests is
I am not sure if you can catch the exception and it is not the intended way. The project description states: Each erroneous implementation is encapsulated in a distinct JUnit test, so the timeout is probably defined outside your tests in the corresponding JUnit test.
Please to not test whether the implementations withstand enormous inputs.
The erroneous implementations
will expose their deficiencies with short inputs already and also do not have to be tested with an exhaustive search
of all possible input strings.
Therefore, fuzzing is not the ideal approach and most likely wastes computation time for you and other students (longer time to completion and less daily tests).
With this approach, you can not be sure to cover all cases. Therefore, you probably have many similar tests and might not catch edge cases.
I suggest to handcraft tests for a good coverage and maybe hardcode a few tests generated by fuzzing.
Regarding timeout: You can not catch the timeout.
If you ran over the time out, your test run will be killed completely and the test is failed.
The server might be slower or faster than your machine and the time might also differ between runs. You probably want to stay away from the timeout limit as otherwise a random fluctuation might cause your tests to fail.
From the lecture notes (paraphrased):
In black-box testing, one wants to test the specification.
There should be a test for every aspect of the specification.
If the specification does not talk about very long very specific strings, a few test cases with not too long strings per case should suffice to find any reasonable bug.
You can also think about how the parser might look.
You can not directly white-box test as you do not have the code available, but you could think about branches and what might achieve a high coverage percentage.
Regarding professional test suites, you can have a look at common compilers like gcc.
Compilers are very large programs with high stakes, are too complicated to formally prove correct, and thus, have enormous test suites.
Some professional test suites are private and are sold for hundreds of euros.
But the design principles are the same.
There are some tools that fuzz compilers in different ways (there are lectures at our university that focus on fuzzing alone) but many test suites handcraft and generate
test cases and maybe add some hardcoded results from fuzzing.
Please do not run more than a few hundred different tests. In general, think about scaling and playing nice. If everyone makes the server execute for five minutes, it may not get done testing everyone’s tests, and then you will get less feedback.