Generatedict how to make sure all of them get the same chances of getting selected

How do you use drand48? I’m still not sure how to make sure all the word have equal chances of getting selected. Should i read it first until all the word stored in trie, or should i do it before? I didn’t seem to understand how to use the drand48 function.

Whether you first sample the random word, during construction of the trie, or afterward is up to you.

Regarding drand48:
This function returns a uniformly sampled pseudo-random number in the range [0.0, 1.0)^* of type double.

You have to come up with a way to lift the random number you get to a way of selecting a word uniformly at random from the provided dictionary.

The randomness has to be seeded. But this is already done for you with srand48(time(NULL)); in main.c.

^*) Some documentation of drand48 claims the range to be [0.0, 1.0]. We rely on the Linux man page.

does this mean drand48, return an array? a single int?

It returns a double.

https://man7.org/linux/man-pages/man3/drand48.3.html

double drand48(void);

i have the same question but ive already experimented with drand48 in a sperate test programm
ive come to the conclusion that drand48 returns the same sequence of values each time it is called in the same programm
so if i execute the same programm multiple times i get the same sequences of floats
is this normal or am i encountering some sort of bug?
grafik
grafik

If you look at main.c you will find a line:

srand48(time(NULL));

which essentially determines the seed value for your random number sequence by tapping into the system time. You do not specify a seed so your system most likely sets it to some default value. Since drand48 is deterministic you will then of course always get the same sequence for the same seed value.

This is most likely also how they will determine if you actually picked words uniformly using drand48. They will set the seed to some value and then compare your generateDict with theirs and if you get the same words you will pass that test.

time(NULL) returns the current system time, so usually your system should not give a default value there. However, it is weird that the sequence of number stays the same.

Your assumption about the test is bold, because I could be using every second value generated by drand48, or I could count the words in different order.

Best,
Tim

1 Like

He does not use srand48 at all in his test program which is why he gets the same values. That was what I was trying to say.

To be honest I am not 100% sure how testing whether something is distributed “properly” would be done using the minimalist function definitions we have been given. As you say there is much room for doing things differently and some people will most likely lose points due to technicalities.

I know you can’t reveal eval tests but how exactly can we be sure that we implemented the random sampling in the exact way required of us?

You are correct, I misinterpreted your awnser. If there is no seed specified, then probably some default value is used as seed and therefore the sequence of numbers stays the same.

Even better: I do not know the eval tests :wink: Probably, a test checks how much difference there is between the words you selected and a true uniform distribution. I would imagine it is like a hypothesis test.
If you used the numbers generated by drand48, scale them accordingly and then pick your words based on that, you are fine.

Best,
Tim

Even better: I do not know the eval tests :wink: Probably, a test checks how much difference there is between the words you selected and a true uniform distribution. I would imagine it is like a hypothesis test.
If you used the numbers generated by drand48, scale them accordingly and then pick your words based on that, you are fine.

This kind of presupposes that everyone here is aware that given a “uniformly random” number d\in[0,1) and a notion of “weight” there is an established way of “uniformly randomly” picking from the leaves of a tree. And then everyone who has that idea needs to have a similar implementation.

Also there is no such thing as a “uniform distribution” outside of mathematics. Nobody can even verify that a coin flip is uniformly distributed (anyone who tells you otherwise is delusional; also not all coins are created equal) let alone this word picking we are supposed to implement.

Ultimately for verification to make any sense you need to have a form of determinism and a way to ensure the same starting conditions for every test. You can verify that a certain function programmed against a specific interface, called with the same global seed variable on the same system architecture will produce the exact same output. (and maybe some other things have to be fixed as well)

This might appear like nitpicking but it seems like the evaluation of this “random” aspect of Wordle could be unfair to people who are not “in” on the joke (= the way you implemented randomness in the reference implementation).

This is a bold claim.

I can write a random number generator that returns the numbers from 1 to 10 in a new permutation which is reset every time all 10 numbers have been yielded. Such a RNG would be perfectly uniform.

Now, drand48 does not work like this. It does guarantee, however, that its output is not distinguishable from a uniform source of randomness.

It is true that a uniform source of randomness can return a 0 10 times in a row. The likelihood of this happening is about 2^{-480}. For all practical matters, this can be considered impossible.

If I were to write an evaluation test, I would make it perform a KS-test to check whether the distribution one gets by invoking your program (say) 5000 times with a new seed every time is consistent with a uniform distribution for \alpha = 0.99999 or something like this.

1 Like

That is not verification in any meaningful sense. As long as something can happen it is not impossible.

You are working under the assumption that something which has a chance smaller than some (arbitrarily defined) probability is impossible. (You say practically, but that does not actually mean anything)

Statistics sometimes provides good heuristics for problems whose exact solutions are currently unfeasible (or just costly) but nothing more. It has no way of actually forcing anything to happen. (Unless p = 1, but that is not interesting anyway because I wouldn’t use a heuristic if I already knew the way)

(And your RNG would of course be perfectly deterministic. Wether we could predict the permutations or not is irrelevant)

We don’t. This just makes the event about as likely as a cosmic ray causing your test to go wrong because the cosmic ray induces a critical bit flip somewhere.

2 Likes