Copyright © University of Cambridge. All rights reserved.
This problem was a toughnut for some time,
and really tested our users (hopefully in an enjoyable way!) The
first part involved deducing how the code was generated; the second
how the computer determined randomness.
Well done to Ngoc from Newcastle High
School, Australia, who cleverly worked out how the code was
generated and gave a clear description of this
The first number in the code presents the number of red squares.
The next 15 digits indicate the number of vertical "strips" (series
of squares of the same color) of that length. The strip can be of
yellow or red color. Strips of longer than 16 digits are not
counted, since in that case the pattern must be 99%+ non-random. I
think it is a clever way of checking randomness!
James from the Macmillan Academy, Edward
from Thomas Hardye School and Mr and Mrs Brewer also deduced how
the code was generated; this hypothesis can easily be checked
several times to give good confidence in its
correctness.
Well done to Matt from Cambridge, who after
working out the meaning of the pattern also said that
I copied 50 examples of a likely code and
50 copies of an unlikely code into a spreadsheet, and after some
formatting got the data about the digits into columns. By looking
at it there seemed to be lots of zeros but the unlikly samples
appeared to contain bigger numbers, so I sorted to see the range of
the first digit, second digits and so on.
I found that
| Digit |
Unlikely Sample Range |
Likely Sample Range |
| 1 |
[63, 33] |
[58, 40] |
| 2 |
[44, 14] |
[38, 12] |
| 3 |
[23, 4] |
[21, 7] |
| 4 |
[13, 1] |
[12, 2] |
| 5 |
[9, 0] |
[6, 0] |
| 6 |
[5, 0] |
[4, 0] |
| 7 |
[5, 0] |
[3, 0] |
| 8 |
[3, 0] |
[2, 0] |
| 9 |
[2, 0] |
[1, 0] |
| 10 |
[1, 0] |
[1, 0] |
| 11 |
[1, 0] |
[1, 0] |
| 12 |
[1, 0] |
[0, 0] |
| 13 |
[0, 0] |
[1, 0] |
| 14 |
[1, 0] |
[0, 0] |
| 15 |
[1, 0] |
[0, 0] |
| 16 |
[1, 0] |
[1, 0] |
This made is pretty clear that the range of digits for many
unlikely samples was larger than that for many likely
samples.
I therefore postulated the rule that any digits occurring outside
the likely sample range that I saw would be classed as
an unlikley sample -- this assumed independence on the part of the
computer, which seemed sensible to me given the fact that all
likely sample ranges (apart from the very high digits) were
narrower than all of the others.
After some checking and refinement, and looking carefully at each
digit I found that any of the following seemed certain
to make a code unlikely
| Digit |
Guaranteed Unlikely range |
| 1 |
< 40 or > 60 |
| 2 |
< 10 or > 39 |
| 3 |
< 5 or > 22 |
| 4 |
< 2 or > 12 |
| 5 |
> 8 |
|
|
For the 6th digit or more in the string I could find no such
pattern, and each digit in a likely and an unlikely code seemed to
cover a similar range. This led me to look to all such digits
together and I noticed that there were often more of them, although
I couldn't say more than that.
This is indeed how the
randomisation was implemented, with the additional fact that if the
digits 6 and higher gave a total more than 6 then the code was also
unlikley.Well done!
Regarding the determination of randomness
by the computer, Ngoc said
I think the program works using two "checks": 1- Counting the
number of red squares left. If yellow and red were selected at
equal probability, than the number of squares left must be around
50. This provide a quick check to the square's randomness. 2- When
condition 1 above is satisfied, the program proceeds to counting
the number of consecutive vertical squares with the same color.
Patterns containing long strips of the same color are less likely
to be random pattern. In both cases, specific probabilities are
worked out using the binomial theorem. I have worked out the
details but sorry I can't enter the full proof here.
Indeed, you can do a theoretical analysis
of the distribution of strips. We made our implementation of the
distribution assuming the simplifying assumption of
independence.
Edward from Thomas Hardye School found a
concrete example of a set of patterns which were going to be
determined non-random.
I tested this further; any grid which had this first value not in
the interval [40, 60] was deemed either "99% not random" or "almost
certain not random". I'm not absolutely certain this is true
without checking all the grids though. (The converse is completely
untrue; many 'non-random' grids still had this first value near
50). The second value seemed neverto be below 10, or far above
30.
This was as far as I got with working out what the numbers meant;
hopefully sometime I'll get a bit further! Happy Christmas :)
We were very pleased to see this joint
response from Mr and Mrs Brewer (who, rightly, point out that this
is not easy)
We would just like to say that any students who even half-manage to
solve this problem deserve a big pat on the back! We are 29 &
30 both with oxbridge degrees and haven't got it yet! Figured out
the code but cant get how it makes it's decision. If you're allowed
to give us the answer we'd be very pleased!
Other attempts from Noah from LRGS, Dapeng
Wang from Claremont Fan Court School and Ben all gave suggestions
which pointed to an expected distribution of patterns