Typescript Programming Test Problems Repo Update

TSMT

Sorry for the lack of blog posts – I’ve been extremely busy with a gig over the last several months.

I recently received an inquiry regarding a problem to be included in the Typescript Programming Test Problems repo. This one requires a more extensive writeup for the analysis, so I decided to devote a blog post to the topic.

Problem Statement: You are given a function, die5() that simulates the roll of a five-sided die. Write a function that uses only die5() to simulate the roll of a die with a small prime number of faces. Discuss limitations or edge cases involved with the use of this function.

At first read, my guess is that the prime number of faces has nothing to do with the problem. It’s indicative of real-world applications where a person is often given either too little or too much information to solve a problem. One of the challenges is knowing what needs to be obtained for a solution and what can be discarded. So, let’s restate the problem slightly as how to simulate the roll of a k-sided die (where k is an integer > 5) using only a five-sided die.

A five-sided die can only generate five combinations, 1, 2, 3, 4, and 5. For a k-sided die, we need to be able to uniformly generate k combinations, but are restricted to a five-sided die as a generator. There was, however, no restriction placed on the number of rolls of the five-sided die. Let’s say we roll the five-side die twice. There are 25 possible combinations. Three rolls has 125 possible combinations. If r is a positive integer > 0, then r rolls generates 5^r possible combinations.

I like to have some sort of mathematical model for such problems. Let [1,5] refer to a pseudo-random selection of integers in the interval one to five, inclusive. We expect die5() to approximately model uniform samples from [1,5]. The statistical quality of the final result, of course, depends on the quality of that sampling.

With 25 combinations, we could theoretically model the roll of up to a 25-sided die. For numbers between 6 and 25, modulo arithmetic can help by dividing 25 in to ‘bins’ of length k, with potential leftover. Consider the next two primes in sequence, 7 and 11. Here are the ‘bins’ for each case with the corresponding roll of the 7-sided die.

7-sided die with 25 combinations (3 full bins, one partial)

1 2 3 4 5 6 7 - 8 9 10 11 12 13 14 - 15 16 17 18 19 20 21 - 22 23 24 25  
1 2 3 4 5 6 7 - 1 2 3  4  5  6  7  - 1  2  3  4  5  6  7  - *  *  *  *  

11-sided die with 25 combinations (two full bins, on partial)

1 2 3 4 5 6 7 8 9 10 11 - 12 13 14 15 16 17 18 19 20 21 22 - 23 34 25 
1 2 3 4 5 6 7 8 9 10 11 - 1  2  3  4  5  6  7  8  9  10 11 - *  *  *

Consider a generating function, g(r), that depends only on the number of rolls, r, of a five-sided die. For the above examples, r = 2. The generating function should return uniform samples of [1,5^r] or [1,25] for the two-roll case. If a generated number falls in a ‘full’ bin, then take the result mod k and add one. That generates a sample in [1,k], which is the desired result.

A decision must be made on how to handle cases where the generating function produces a number in a ‘partial’ bin. The easiest solution is to simply call the generator recursively. This has the benefit of being simple, and recursive programming is a fad for testers, so you are likely to gain some style points :). It does have the side-effect of ’throwing away’ iterates in [1,5]. For a perfectly uniform generation in [1,5], that is not an issue. However, in practice, die5() is likely to suffer from statistical flaws that could have a downstream effect on the final result.

So, what does a generating function look like? A simple, linear polynomial is as good a start as any, so consider models of the following form for the two-roll generator

g(2) = a[1,5] + b[1,5] – c

where a, b, and c are positive integers. This generalizes in the multi-roll case to

g(r) = a[1,5] + b(1)[1,5] + b(2)[1,5] + … + b(r-1)[1,5] – c

Returning to the two-roll generator, what constraints can be placed on the generation? Define n to be the largest integer such that nk <= 25. This is the number of full bins, given the number of sides of the die, k. The smallest input roll is one and the largest is five. When a one is rolled both times, the result should be equal to one to cover the low range of [1,25]. When a five is rolled both times, the generator should produce 25 in order to cover the high range of [1,25].

1 <= a[1,5] + b[1,5] – c <= 25

At the high end of both [1,5] rolls, the generated value should be greater than or equal to nk.

This produces three unknowns, a, b, and c, but only two constraints. So, there is a family of possible generating functions.

The generator must also return uniform numbers in [1,25]. If a frequency count of each generated number is computed, the histogram should be perfectly flat as the number of generations tends to infinity. A way to test for this is to loop over the generations for each possible roll of the five-sided die and simply print out the result. Numbers that are skipped or repeated lead to bias in g(r).

To resolve the underdetermined problem, set b = 1 and use the two constraints to solve for a and c. Since addition is associative, the model is the same as setting a = 1 and solving for b and c. Just reverse the order of the a and b terms.

The simplified model is

g(2) = a[1,5] + [1,5] – c

Use the two extreme cases where [1,5] = 1 and [1,5] = 5 for both rolls fo the five-side die. At the low end,

1 = a*1 + 1 – c = 1(a + 1) – c

At the high end,

25 = a*5 + 5 – c = 5(a + 1) – c

Let j = a + 1, then

1 = j – c or j = 1 + c

25 = 5j – c = 5(1+c) – c = 5 + 4c

This yields c = 20/4 = 5 and j = 1 + c = 6, which means a = 5.

Computation of n given k is pretty simple and the entire model is created in the getCoefs() function in the code distribution for this problem. It only works for the two-roll problem. I do not like the idea of generalizing this particular family to a number of rolls greater than two as it puts too much weight on the first roll. The likelihood of generating a model that is non-uniform is too great.

I solved the three-roll model by hand and got a = 29 and c = 30. I did verify that the two-roll model is uniform. I believe that the three-roll model is not uniform, but perhaps an astute reader will prove me wrong (I have very limited time to devote to these problems). So, I can address the two-roll model which is good for a ‘small’ number of primes as long as k <= 25 and we have a ‘perfect’ quality generator for [1,5]. In practice, the latter will not be true.

To simulate the roll of a k-sided die using a generator for [1,5], pass in the two-roll model and generate two rolls in [1,5]. Use those results in the generating function for k. Then, test if the value is less than or equal to nk. If so, modulo-k math produces a result. Otherwise, call the generating function again. This code is provided in the simulate() function,

let roll: number = model.a*RandomIntInRange.generateInRange(1,5) + 
RandomIntInRange.generateInRange(1,5) - model.c;

// test if number is within the integer multiple of k
if (roll <= model.n*k)
{
  // return result mod k plus 1 to compensate for 1-based index
  return (roll%k) + 1;
}
else
{
  // too bad ... try again
  return simulateRoll(model, k);
}

where the Typescript Math Toolkit RandomIntInRange class is used to generate iterates in [1,5]. Since it is based on the system-supplied Math.random(), it is a typical linear congruential generator and suffers from all the known flaws of such generators.

I supplied a validate() function that computes the frequency histogram for a given value of k. This is used in the specs for k = 6, 7, 8, and 9. The numbers look pretty good, but would be better for a higher-quality [1,5] generator. Try it for other values of k and a greater number of iterations per validation and study the results.

So, the two limitations are we can only simulate up to a 25-sided die using two rolls of a five-sided die and the final output is dependent on the statistical quality of the generator that simulates the five-sided die. The general problem (for any value of k > 5) requires extensive iteration and simulation to produce a model that meets the necessary constraints, particularly that of uniform distribution in [1, 5^r].

Here is the repo – experiment with the code and have fun!

Comments are closed.