homophonic cipher

Challenge 12: Nigh Impossible

Here’s Challenge #12 for November 8, 2017.

Today, I wanted to learn about homophonic substitution ciphers. These ciphers aim to thwart frequency analysis by assigning multiple ciphertext symbols to each plaintext symbol. The higher frequency of a letter, the more cipher symbols it is assigned. One of the most famous examples of homophonic ciphers is Rossignols’ Great Cipher.

For my homophonic cipher, I wanted to closely-match the frequency of each letter. For example, the letter E has the highest frequency at 12.7 percent. If my cipher used a pool of 100 numbers, 13 of those would represent the letter E. But if I were to use only 100, what would I do with letters such as Z, which have less than 0.1 percent frequency? In order to represent these low frequency numbers, I had to increase my pool to 1000 numbers. So now, the letter E is represented by 127 different characters!

My symbols-to-letters distribution corresponds fairly closely to the characteristic distribution. The letters Q and Z each have 1 plaintext symbol. The letters M and W each have 24. The letter A has 82.

Such a large pool of symbols means that cracking this cipher without hints will be nigh impossible. I’ll reveal a few hints now and occasionally update the post with further clues. I expect this to be the most challenging cipher I’ve yet posted on CodeAWeek.

Each 4 digit number represents a letter in the English alphabet. Here’s the ciphertext:

0385 0376 0275 0591 0106 0856 0957 0894
0808 0997 0830 0801 0511 0556 0648 0995
0295 0587 0756 0686 0983 0169 0207 0353
0111 0447 0545 0168 0162 0294 0632 0475
0937 0951 0996 0444 0549 0410 0652 0939
0701 0936 0312 0231 0770 0186 0898 0458
0374 0507 0479 0423 0017 0198 0323 0550
0306 0233 0460 0702 0625 0583 0708 0004
0524 0205 0305 0037 0038 0677 0351 0465
0299 0092 0753 0293 0018 0775 0100 0654
0311 0938 0108 0612 0496 0118 0495 0698
0665 0201

Notice that no number repeats, which tells you that the plaintext has one or no occurrences of Q and Z.

Here’s the first big hint, which will hopefully take this cipher from nigh impossible to nearly nigh impossible. Here are all 90 symbols for the letter T:

0268 0930 0367 0715 0294 0709 0688 0010 0704 0858 0266 0306 0886 0438 0502 0655 0595 0885 0673 0995 0708 0468 0142 0040 0078 0863 0558 0399 0210 0436 0091 0090 0952 0029 0701 0417 0264 0092 0175 0861 0250 0170 0605 0729 0681 0318 0221 0263 0077 0597 0197 0686 0430 0650 0542 0864 0805 0693 0505 0017 0924 0488 0625 0543 0362 0099 0849 0682 0946 0192 0422 0012 0239 0312 0194 0184 0347 0277 0064 0425 0557 0753 0156 0663 0877 0481 0830 0434 0410 0827

With CodeAWeek.com, I hope to release one cipher, puzzle, or mystery each week. Anyone can attempt to solve. The winner is the first person to send a correct solution and a description of the solve method to codemaster@codeaweek.com. Once a correct solution is e-mailed, I will publish a follow-up post, congratulating the winner and revealing the secrets of the code.

You may post questions or theories in the comments, but DO NOT POST SOLUTIONS. E-mail them to codemaster@codeaweek.com.