Copyright © University of Cambridge. All rights reserved.

This problem was a toughnut for some time, and really tested our users (hopefully in an enjoyable way!) The first part involved deducing how the code was generated; the second how the computer determined randomness.

Well done to Ngoc from Newcastle High School, Australia, who cleverly worked out how the code was generated and gave a clear description of this

The first number in the code presents the number of red squares. The next 15 digits indicate the number of vertical "strips" (series of squares of the same color) of that length. The strip can be of yellow or red color. Strips of longer than 16 digits are not counted, since in that case the pattern must be 99%+ non-random. I think it is a clever way of checking randomness!

James from the Macmillan Academy, Edward from Thomas Hardye School and Mr and Mrs Brewer also deduced how the code was generated; this hypothesis can easily be checked several times to give good confidence in its correctness.

Well done to Matt from Cambridge, who after working out the meaning of the pattern also said that

I copied 50 examples of a likely code and 50 copies of an unlikely code into a spreadsheet, and after some formatting got the data about the digits into columns. By looking at it there seemed to be lots of zeros but the unlikly samples appeared to contain bigger numbers, so I sorted to see the range of the first digit, second digits and so on.

I found that

Digit | Unlikely Sample Range | Likely Sample Range |

1 | [63, 33] | [58, 40] |

2 | [44, 14] | [38, 12] |

3 | [23, 4] | [21, 7] |

4 | [13, 1] | [12, 2] |

5 | [9, 0] | [6, 0] |

6 | [5, 0] | [4, 0] |

7 | [5, 0] | [3, 0] |

8 | [3, 0] | [2, 0] |

9 | [2, 0] | [1, 0] |

10 | [1, 0] | [1, 0] |

11 | [1, 0] | [1, 0] |

12 | [1, 0] | [0, 0] |

13 | [0, 0] | [1, 0] |

14 | [1, 0] | [0, 0] |

15 | [1, 0] | [0, 0] |

16 | [1, 0] | [1, 0] |

This made is pretty clear that the range of digits for many unlikely samples was larger than that for many likely samples.

I therefore postulated the rule that any digits occurring outside the likely sample range that I saw would be classed as an unlikley sample -- this assumed independence on the part of the computer, which seemed sensible to me given the fact that all likely sample ranges (apart from the very high digits) were narrower than all of the others.

After some checking and refinement, and looking carefully at each digit I found that any of the following seemed certain to make a code unlikely

Digit | Guaranteed Unlikely range |

1 | < 40 or > 60 |

2 | < 10 or > 39 |

3 | < 5 or > 22 |

4 | < 2 or > 12 |

5 | > 8 |

For the 6th digit or more in the string I could find no such pattern, and each digit in a likely and an unlikely code seemed to cover a similar range. This led me to look to all such digits together and I noticed that there were often more of them, although I couldn't say more than that.

This is indeed how the randomisation was implemented, with the additional fact that if the digits 6 and higher gave a total more than 6 then the code was also unlikley.Well done!

Regarding the determination of randomness by the computer, Ngoc said

I think the program works using two "checks": 1- Counting the number of red squares left. If yellow and red were selected at equal probability, than the number of squares left must be around 50. This provide a quick check to the square's randomness. 2- When condition 1 above is satisfied, the program proceeds to counting the number of consecutive vertical squares with the same color. Patterns containing long strips of the same color are less likely to be random pattern. In both cases, specific probabilities are worked out using the binomial theorem. I have worked out the details but sorry I can't enter the full proof here.

Indeed, you can do a theoretical analysis of the distribution of strips. We made our implementation of the distribution assuming the simplifying assumption of independence.

Edward from Thomas Hardye School found a concrete example of a set of patterns which were going to be determined non-random.

I tested this further; any grid which had this first value not in the interval [40, 60] was deemed either "99% not random" or "almost certain not random". I'm not absolutely certain this is true without checking all the grids though. (The converse is completely untrue; many 'non-random' grids still had this first value near 50). The second value seemed neverto be below 10, or far above 30.

This was as far as I got with working out what the numbers meant; hopefully sometime I'll get a bit further! Happy Christmas :)

We were very pleased to see this joint response from Mr and Mrs Brewer (who, rightly, point out that this is not easy)

We would just like to say that any students who even half-manage to solve this problem deserve a big pat on the back! We are 29 & 30 both with oxbridge degrees and haven't got it yet! Figured out the code but cant get how it makes it's decision. If you're allowed to give us the answer we'd be very pleased!

Other attempts from Noah from LRGS, Dapeng Wang from Claremont Fan Court School and Ben all gave suggestions which pointed to an expected distribution of patterns