Fuzzing Questions


I have these questions:

1. How much fuzzing is allowed?
I currently get these results (all tests passed) on my implementation (and pushed a working version with 50 fuzzes total):

# of fuzzes 50 500 5000 50000 500000
runtime (s) 0,43 1,40 3,90 10,40 70,20

Is this too slow?

What timeout can we expect on the server? If you can’t tell us what the timeout is, then…

2. … how do we maximize our chances?
Can I tell my test suite to stop in “an orderly fashion” just as I get the timeout? (This is Java, so it probably means catching something called TimeOutException, I didn’t google it yet)

This way I could get the maximum amount of tests while also passing daily tests (because unhandled timeout means a failed test).

The project description says not to stress the servers too much but on the other hand the fuzzer is the way to go for this project and it is heuristic in nature (more = better (with diminishing returns)), so I am in conflict. :grinning:


I won’t be able to completely answer your questions, but maybe I can help a bit:

  1. The runtime question is difficult, because on different machines you will have different execution times. The timeout induced by the secret tests is usually not made public. That being said, there might be creative ways to get an estimate of what the time out on daily tests is :shushing_face:

  2. I am not sure if you can catch the exception and it is not the intended way. The project description states: Each erroneous implementation is encapsulated in a distinct JUnit test, so the timeout is probably defined outside your tests in the corresponding JUnit test.



As said in the exercise:

Please to not test whether the implementations withstand enormous inputs.

The erroneous implementations
will expose their deficiencies with short inputs already and also do not have to be tested with an exhaustive search
of all possible input strings.

Therefore, fuzzing is not the ideal approach and most likely wastes computation time for you and other students (longer time to completion and less daily tests).

With this approach, you can not be sure to cover all cases. Therefore, you probably have many similar tests and might not catch edge cases.

I suggest to handcraft tests for a good coverage and maybe hardcode a few tests generated by fuzzing.

Regarding timeout: You can not catch the timeout.
If you ran over the time out, your test run will be killed completely and the test is failed.

The server might be slower or faster than your machine and the time might also differ between runs. You probably want to stay away from the timeout limit as otherwise a random fluctuation might cause your tests to fail.


I guess I have to trust that promise from the specification then (which might cost me points through no fault of my own), add hardcoded tests and only run like 100 fuzzes on the server.

How would a professional test suite for a parser look like actually? I assume it would contain some hardcoded tests as you mentioned but I doubt that would be enough by any serious standard.

From the lecture notes (paraphrased):
In black-box testing, one wants to test the specification.
There should be a test for every aspect of the specification.
If the specification does not talk about very long very specific strings, a few test cases with not too long strings per case should suffice to find any reasonable bug.

You can also think about how the parser might look.
You can not directly white-box test as you do not have the code available, but you could think about branches and what might achieve a high coverage percentage.

One usually does not have errors like:

  fail horribly

Regarding professional test suites, you can have a look at common compilers like gcc.
Compilers are very large programs with high stakes, are too complicated to formally prove correct, and thus, have enormous test suites.
Some professional test suites are private and are sold for hundreds of euros.
But the design principles are the same.

There are some tools that fuzz compilers in different ways (there are lectures at our university that focus on fuzzing alone) but many test suites handcraft and generate
test cases and maybe add some hardcoded results from fuzzing.

You can have a look at the LLVM frontend used in modern compilers:
Assembler Parser Unit Tests (400 lines)
Assembler file tests (>400 files)
YAML parser test

Please do not run more than a few hundred different tests. In general, think about scaling and playing nice. If everyone makes the server execute for five minutes, it may not get done testing everyone’s tests, and then you will get less feedback.

Sorry, I usually set the loop for the fuzzer to 0 iterations before pushing but I must have forgotten.

A clever use of git would be to add the fuzzing stuff to .gitignore and to push a few picked (auto-)generated tests.
That way you would not have to change things every time you push.

1 Like