Substitution Cipher
Problem
Substitution Cipher printable sheet
In any language some letters tend to appear more often than others.
Which letters do you think are the most common in English?
Is the frequency of letters in the following sentence representative?
The quick brown fox jumps over the lazy dog.
In the coded text below, every letter in the original message was switched with another letter. Find out which are the most (and least) common letters used in English to help you crack this coded message:
Vfj Uzyj fxc pjjw nzkrgwi tjkl fxkc xyy vfj uzkwgwi, qbkgwi-dyjxwgwi fgq ygvvyj fzuj. Hgkqv ngvf pkzzuq, vfjw ngvf cmqvjkq; vfjw zw yxccjkq xwc qvjbq xwc dfxgkq, ngvf x pkmqf xwc x bxgy zh nfgvjnxqf; vgyy fj fxc cmqv gw fgq vfkzxv xwc jljq, xwc qbyxqfjq zh nfgvjnxqf xyy ztjk fgq pyxdr hmk, xwc xw xdfgwi pxdr xwc njxkl xkuq. Qbkgwi nxq uztgwi gw vfj xgk xpztj xwc gw vfj jxkvf pjyzn xwc xkzmwc fgu, bjwjvkxvgwi jtjw fgq cxkr xwc yznyl ygvvyj fzmqj ngvf gvq qbgkgv zh cgtgwj cgqdzwvjwv xwc yzwigwi. Gv nxq quxyy nzwcjk, vfjw, vfxv fj qmccjwyl hymwi cznw fgq pkmqf zw vfj hyzzk, qxgc 'Pzvfjk!' xwc 'Z pyzn!' xwc xyqz 'Fxwi qbkgwi-dyjxwgwi!' xwc pzyvjc zmv zh vfj fzmqj ngvfzmv jtjw nxgvgwi vz bmv zw fgq dzxv. Qzujvfgwi mb xpztj nxq dxyygwi fgu gubjkgzmqyl, xwc fj uxcj hzk vfj qvjjb ygvvyj vmwwjy nfgdf xwqnjkjc gw fgq dxqj vz vfj ixtjyjc dxkkgxij-ckgtj znwjc pl xwguxyq nfzqj kjqgcjwdjq xkj wjxkjk vz vfj qmw xwc xgk. Qz fj qdkxbjc xwc qdkxvdfjc xwc qdkxppyjc xwc qdkzzijc xwc vfjw fj qdkzzijc xixgw xwc qdkxppyjc xwc qdkxvdfjc xwc qdkxbjc, nzkrgwi pmqgyl ngvf fgq ygvvyj bxnq xwc umvvjkgwi vz fguqjyh, 'Mb nj iz! Mb nj iz!' vgyy xv yxqv, bzb! fgq qwzmv dxuj zmv gwvz vfj qmwygifv, xwc fj hzmwc fguqjyh kzyygwi gw vfj nxku ikxqq zh x ikjxv ujxczn. 'Vfgq gq hgwj!' fj qxgc vz fguqjyh. 'Vfgq gq pjvvjk vfxw nfgvjnxqfgwi!' Vfj qmwqfgwj qvkmdr fzv zw fgq hmk, qzhv pkjjojq dxkjqqjc fgq fjxvjc pkzn, xwc xhvjk vfj qjdymqgzw zh vfj djyyxkxij fj fxc ygtjc gw qz yzwi vfj dxkzy zh fxbbl pgkcq hjyy zw fgq cmyyjc fjxkgwi xyuzqv ygrj x qfzmv. Emubgwi zhh xyy fgq hzmk yjiq xv zwdj, gw vfj ezl zh ygtgwi xwc vfj cjygifv zh qbkgwi ngvfzmv gvq dyjxwgwi, fj bmkqmjc fgq nxl xdkzqq vfj ujxczn vgyy fj kjxdfjc vfj fjcij zw vfj hmkvfjk qgcj.
Javkxdv hkzu 'Vfj Ngwc gw vfj Ngyyznq' pl Rjwwjvf Ikxfxuj
Xtxgyxpyj vz kjxc gw hmyy xv nnn.imvjwpjki.zki
If you want to work on a computer to solve the problem, you can download the message as a text file which doesn't contain any line breaks.
If you are interested in code breaking you might enjoy the Secondary Cipher Challenge.
You can read about Substitution Ciphers and Frequency Analysis on Simon Singh's website: http://www.simonsingh.net/The_Black_Chamber/crackingsubstitution.html
Getting Started
Start by performing a frequency analysis on some selected text to see which letters appear most often. It is better to use longer texts, as a short text might have an unusual distribution of letters, like the "quick brown fox..." mentioned in the problem.
The toolkit allows you to change a pair of letters quickly and see the effect it has on the message. Alternatively, you could use Word's "Find and Replace" feature, with the 'Match Case' option, to change one letter at a time. You can distinguish between the coded message and the deciphered letters by using lower case for the coded message and replacing each letter by the upper case letter you think it represents.
The whole message has been encoded by switching pairs of letters.
For example, B and P have been switched so that every P is replaced by B, and every B replaced by P.
Student Solutions
This was an interesting problem! Thanks for all the solutions, all of which were correct - let's see how some of you attacked this puzzle.
Jessica, from Chichester High School for Girls, started off by noting that E is the most common letter in English. Holly, from Hymers College, agreed, and after counting the frequencies of various letters in the code, concluded that E had probably been substituted by J, since J occurred so frequently in the code. This method can be used repeatedly to guess the rough distribution of letters. Are there any other ways of working things out?
Oak class from Henham and Ugley School explained their strategy:
We decided to try to identify single letter words first and then researched which letters are most commonly occurring in the English language. Once we had a possible solution we tested it to see if it fitted. If not we tried an alternative.
Geoffrey, from Creative Secondary School, Hong Kong, had the following few ideas:
First, set up a document in Microsoft Word. Change all letters into upper case letters - this will make your life?easier in later steps. Then make a table in another document, which should look like this:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z - - - - - - - - - - - - - - - - - - - - - - - - - -
Once you crack part of the code, use the "replace" function. Replace the code letter in upper case letters with the substituted letter in lower case letters.
Next, search for one-letter words. They can only be "a", "i", or "o".
Some common two-letter words and three-letter words, which you should look for, include:
to do on at as of he the and his her has had
When you work out a letter, write it under the code letter in the table:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z x p d c j h i f g e r y u w z b s k q v m t n a l o
Big tip: there's a website address at the bottom of this code, so you can guess that "nnn." = "www." and ".zki" = ".com/.net/.org".
Great! Thanks for all the responses. By the way, the text was:
The Mole had been working very hard all the morning, spring-cleaning his little home. First with rooms, then with dusters; then on ladders and steps and chairs, with a brush and a pail of whitewash; till he had dust in his throat and eyes, and splashes of whitewash all over his black fur, and an aching back and weary arms. Spring was moving in the air above and in the earth below and around
him, penetrating even his dark and lowly little house with its spirit of divine discontent and longing. It was small wonder, then, that he suddenly flung down his brush on the floor, said 'Bother!' and 'O blow!' and also 'Hang spring-cleaning!' and bolted out of the house without even waiting to put on his coat. Something up above was calling him imperiously, and he made for the steep little
tunnel which answered in his case to the gavelled carriage-drive owned by animals whose residences are nearer to the sun and air. So he scraped and scratched and scrabbled and scrooged and then he scrooged again and scrabbled and scratched and scraped, working busily with his little paws and muttering to himself, 'Up we go! Up we go!' till at last, pop! his snout came out into the sunlight, and
he found himself rolling in the warm grass of a great meadow. 'This is fine!' he said to himself. 'This is better than whitewashing!' The sunshine struck hot on his fur, soft breezes caressed his heated brow, and after the seclusion of the cellarage he had lived in so long the carol of happy birds fell on his dulled hearing almost like a shout. Jumping off all his four legs at once,
in the joy of living and the delight of spring without its cleaning, he pursued his way across the meadow till he reached the hedge on the further side.
Extract from 'The Wind in the Willows' by Kenneth Grahame available to read in full at www.gutenberg.org
Teachers' Resources
Why do this problem?
This problem offers a statistical activity that has immediate practical application. We have offered a spreadsheet toolkit so that students can concentrate on the analysis of the data without needing to waste time on computation.
Possible approach
"Which letters do you think appear most often in the English language?"
"How could you find out?"
Allow some time for students to think about and share their answers. If they have books with them, perhaps suggest that they take a look to see which letters seem most common at first glance.
If a computer room is available, introduce students to the toolkit and give them time to perform a frequency analysis on some English text (Wikipedia articles are a great source for this). Share results from the frequency analysis. Does everybody find the same letters come out top, and bottom? There is opportunity here for some discussion about the benefits of using longer sample texts.
Then present students with the ciphertext in the problem (available as a text file here).
If a computer room is not available, the ciphertext is available as a worksheet here. Here is a second version of the worksheet with the ciphertext faint so that students can write over it as they go deciphering the message.
Key questions
Can you spot the vowels?
Are there any short words? What might they be?
Possible support
Students could be encouraged to work collaboratively on this problem. There are lots of suggestions to help them get started in the hint.
Possible extension
Students could investigate the frequency of digraphs (pairs of letters such as th or sh) in the English language and consider whether this speeds up the deciphering process.
The Secondary Cipher Challenge and Substitution Transposed offer challenging extensions for students who have worked on this problem and the problem Transposition Cipher.