Public test Fail only on server, not locally

I have run all public tests before submitting my project and they all (except for verify, as I didn’t do that part) passed, 100%. I originally altered the tests syntactically (using mutli-line strings for visibility) but I just rolled that back with git and they still all passed as before. The sema test “testCompleteProgram” failed on the server and thus I got no points in the whole category.

I am running Ubuntu 20.04 with Java 17. Has this test been altered on the server for grading?? Is there anything I can do to at least get some of the points I lost?

Here are the two semantically equivalent pieces of code:

Original
@Test
public void testCompleteProgram() {
	final String code = ""
			+ "int z;\n"
			+ "char w;\n"
			+ "char *v;\n"
			+ "void swap(int *a, int *b);\n"
			+ "int foo(int x, int y) {\n"
			+ "	x = x + (y * z) / sizeof(z) - (0 - 5);\n"
			+ "	y = 1337;\n"
			+ "	if ((z = 1000) < x) {\n" /* yes, this is a valid expression */
			+ "		return x;\n"
			+ "	} else {\n"
			+ "		swap(&x, &y);\n"
			+ "		return foo(x, y);\n"
			+ "	}\n"
			+ "}\n"
			+ "void swap(int *a, int *b) {\n"
			+ "	int t = *a;\n"
			+ "	*b = *a;\n"
			+ "	*a = t;\n"
			+ "}\n"
			+ "int main() {\n"
			+ "	v = \"foobar\";\n"
			+ "	w = *(v + 3);\n"
			+ "	return foo(42, 42);\n"
			+ "}\n";
	
	checkCode(code);
}
More readable
@Test
public void testCompleteProgram() {
	// "(z = 1000) < x": yes, this is a valid expression
	final String code = """
			int z;
			char w;
			char *v;
			void swap(int *a, int *b);
			int foo(int x, int y) {
				x = x + (y * z) / sizeof(z) - (0 - 5);
				y = 1337;
				if ((z = 1000) < x) {
					return x;
				} else {
					swap(&x, &y);
					return foo(x, y);
				}
			}
			void swap(int *a, int *b) {
				int t = *a;
				*b = *a;
				*a = t;
			}
			int main() {
				v = "foobar";
				w = *(v + 3);
				return foo(42, 42);
			}
			""";
	
	checkCode(code);
}

PS:

The test only failed when the project was graded, here’s the last Mail I got from the buildbot before grading:

---> test prog2.tests.pub.sema.SemanticsTests::testCompleteProgram
Result: successful

---> test prog2.tests.pub.sema.SemanticsTests::testTypeCheckNegative
Result: successful

---> test prog2.tests.pub.sema.SemanticsTests::testTypeCheckPositive
Result: successful

Hey,

that’s an interesting case. As you might see, the bug only happens non-deterministically. In fact, you need to be quite unfortunate to see this bug happen. This also explains why the bug never occurred in testing.

Looking at your implementation, we can see that you tried to hash-cons your types. In principle, this is fine. However, your implementation is broken. Let’s look at function types:

public final class FunctionType extends AbstractType {
    private static final WeakHashMap<List<Type>, FunctionType> INSTANCES = new WeakHashMap<>();
    public final Type returns;
    public final List<Type> arguments;
    private FunctionType(Type ret, List<Type> args) {
        this.returns = ret;
        this.arguments = List.copyOf(args);
    }
    public static FunctionType of(Type ret, List<Type> args) {
        final var sig = new ArrayList<Type>(args.size() + 1); //line 33
        sig.add(ret);
        sig.addAll(args);
        return of(sig, args);
    }
    private static FunctionType of(List<Type> sig, List<Type> args) {
        final var cached = INSTANCES.get(sig);
        if (cached != null) return cached;
        final var n = new FunctionType(sig.get(0), args);
        INSTANCES.put(sig, n);
        return n;
    }

You have a Map<List<Type>, FunctionType> called INSTANCES which tries to store all possible instances of function types. More concretely, the function type int(char, void*) would be mapped by the key [Type_int, Type_char, Pointer[Type_void]]. However, you use a WeakHashMap. According to the spec, this is a

[h]ash table based implementation of the Map interface, with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use.

Now, what are the keys here? The key is a list you construct in line 33. It is only stored in that function, and of course as a key in the map. Thus, once the method ends, the key list is “no longer in ordinary use”, and the garbage collector will discover this the next time it runs. It will then remove the corresponding entry from that list. If you later construct this type again, you get two instances of a type that are supposed to be the same. This breaks lots of your implicit invariants, causing your type checker to compute the wrong result.

Unfortunately, garbage collector runs are non-deterministic. Thus, making sure that software relying on the garbage collector (not) causing visible effects is correct is hard. In this case, you failed to do so. The problem could have been avoided by not using weak references at all – a simple HashMap would have made your program correct (as far as I can tell).

Note that this is only an issue if the garbage collector runs during parsing. This is more of an issue if you run your program in a memory-constrained environment, like the test server (where multi-threading also affects the likelihood of the garbage collector running) or the VM. I can reliably make your test case fail by inserting a System.gc() call in the beginning of your MyASTFactory::createFunctionType.

What you likely wanted was a HashMap<.., WeakReference<FunctionType>> (see WeakReference), which would have “removed” entries when they become unreferenced. Alternatively, you could have make sure that the mapped values strongly reference their keys, so that they never go unreferenced before their values do so. This is actually what you do for your pointer types, making them not affected (note that this also makes having a WeakHashMap useless, since it strongly references the value, which strongly references the key – thus it will never go unreferenced).

So to answer your questions: No, your code has not been altered on the server. Also, you changing our tests had no effect, the bug occurs with or without these changes, and they are overwritten on the server anyways.

Your code is quite simply incorrect, but in a hard-to-notice way. The fact that it only failed during grading is quite unfortunate, but we did not purposefully (nor accidentally, it remained the same) alter the build/testing environment to cause this – it happened by accident.

1 Like

Thank you so much!

Yes, I misread the documentation :grimacing:

1 Like