CODEX Decryptor

Paste output from the CODEX codebook cipher app and watch how a cryptanalyst would attack it. The tool runs a seven-step pipeline: it detects the format, fingerprints the cipher layer, attempts to strip it, performs group frequency analysis, builds a frequency-based mapping, demonstrates a context attack using word segmentation and bigram analysis, then reconstructs the plaintext using all available evidence. Each step shows key statistics with reference ranges, and longer explanations are available via “read more” toggles.

Paste CODEX ciphertext

Cipher override:

6½. Crib Attack interactive

A crib is a word or phrase the analyst has reason to believe is in the plaintext — a proper noun (Wilhelm, Alexandria, Mediterranean), a stereotyped opening, or a phrase the message is likely to contain. The cribs that work best here are:

Words with repeated letters — the repeat is a fingerprint. “Wilhelm” letter-spelled has position 2 and 5 the same code (both l); a 5-group run matching that pattern is almost certainly Wilhelm. Identified chunks get the right label.
Capitalized words — entering the crib with its leading capital tells the analyzer to require [CAP] before the match, which dramatically narrows the search window. Words like “Alexandria” or “Mediterranean” get found this way even without strong internal repeats.
Long-ish words — the more chunks the stronger the structural fingerprint.

Words without internal repeats AND without capitalization (e.g. “kingdom”, “subjects”) can’t be reliably attacked — there’s no fingerprint to constrain the search. The analyzer will report NO MATCH rather than guess. This is the historically authentic attack technique: Room 40, MI-8, and Bletchley all relied on cribs to break real codebook ciphers.

Cribs (one per line):

Each word tried independently. Proper nouns work best. Apostrophes and hyphens are preserved — names like O’Brien and compounds like father-in-law can be cribbed with the punctuation in place, and the apostrophe / hyphen codes themselves get identified along with the letter codes.

Bigram propagation (runs automatically after cribs)

After the crib attack identifies some letter codes, the analyzer scans every still-unknown code that sits between two KNOWN single-letter codes inside a chunk run. For each such code, it scores all 26 candidate letters by bigram(left, X) × bigram(X, right) using an English letter-bigram frequency table (top ~200 bigrams: th at 3.6%, he at 3.1%, in at 2.4%, etc.) and commits to the top candidate when (a) it dominates the runner-up by ≥3×, (b) at least 3 codetext positions support it, and (c) the letter isn’t already mapped to another code.

Conservative by design. Only uses single-letter crib anchors as neighbor evidence (multi-char crib labels can be ambiguous — see the Alexandria example — and propagating from them would amplify errors). Requires BOTH neighbors to be anchors (one-sided constraint commonly picks the most-common follower of the anchor regardless of the actual code). When there’s insufficient evidence, it adds nothing rather than guessing.

How much it helps depends on how many GOOD single-letter anchors the cribs produced. Cribs that work especially well for bigram bootstrap: Wilhelm (gives l, m), Mississippi (gives s, p), Hawaii (gives w), Aaron (gives a, r), etc. The more correct anchors close together in the text, the more letters propagation can chain out from each one.

How the crib attack works

For each crib word the analyzer:

Lowercases and strips non-letters.
Enumerates every possible split of the word into chunks of length 1, 2, or 3. “Wilhelm” produces 44 candidate chunkings; “Thomas” produces 13.
For each chunking, scans the codetext for any sequence of N consecutive non-space codes (where N is the chunk count) where the within-word repeat pattern holds. Chunks of the same text must map to the same code; chunks of different text must map to different codes.
Requires word boundaries on either side (preceded by [space] or [space][CAP], followed by [space] or punctuation). This filters out random false-positive matches inside other words.
If a chunking has matches that all agree on which code each chunk represents, score it by (match count) × (chunks identified) × (inter-match agreement). The top-scoring chunking wins.

Each crib’s identified codes become known constraints for the next crib. So if you enter Wilhelm first and it identifies w/i/l/h/e/m, then Thomas can use those known letters to validate that th and m are consistent — making the next attack faster and more accurate.

Identified codes go to the reconstruction at the highest confidence and override any anchor or context-attack guesses.

Beyond Frequency Analysis

Panels 1–5 above use simple frequency counting — ranking code groups by how often they appear and matching the distribution to expected English. This works against simple substitution ciphers, but codebook ciphers (especially with homophones) are specifically designed to defeat it.

Panel 6 demonstrates the first steps beyond frequency counting. Without homophones, a context attack identifies [space], segments the text into words, and uses bigram patterns to recover dozens of word mappings. With homophones, a distributional clustering attack groups code groups by context similarity to identify which codes are homophones of the same token, then combines their frequencies. Panel 7 lets you toggle between frequency-only and context-enhanced reconstruction to see the difference.

But even context analysis is just the beginning. Historically, every major codebook system was eventually broken. Here is the full toolkit a modern cryptanalyst would bring to bear:

1. Known-Plaintext and Crib Attacks

The single most powerful technique. If the analyst knows (or can guess) even a small fragment of the original message — a greeting, a date, a signature, a formulaic opening — they can match those words against the code groups at each position. Every confirmed match recovers codebook entries, and those entries can then be recognized wherever they appear in the rest of the message (or in other messages using the same codebook). This bootstrapping effect means a single good crib can unravel a large portion of the codebook.

This is how the British Room 40 broke German naval codes in WWI, and how Bletchley Park attacked Enigma in WWII. The principle is the same for codebooks: find something you know, and use it as a lever.

2. N-gram Context Analysis

Instead of looking at each code group in isolation, analyze which groups appear next to which other groups. In English, certain word sequences are extremely common: “of the”, “in the”, “to the”, “it is”. If you have identified the code for “the” (the second most frequent group), then whatever groups appear immediately before it most often are likely “of”, “in”, and “to”. Each confirmed mapping opens up more context-based deductions.

A computer can build a co-occurrence matrix — for every pair of code groups, how often do they appear adjacent? — and compare this matrix against expected English word-pair frequencies. This is far more powerful than single-group frequency counting.

3. Homophone Clustering

The hardest part of attacking a homophonic codebook is figuring out which code groups are homophones of each other (i.e., different codes for the same plaintext token). Panel 6 above demonstrates a basic version of this when homophones are detected. Modern approaches use distributional similarity: two groups that consistently appear in the same contexts (same neighbors, same positions in sentences) are likely homophones. Techniques include:

Mutual information: measure how much knowing one group’s neighbors tells you about another group’s neighbors. High mutual information → likely homophones.
Context vectors: represent each group by a vector of its neighbor frequencies, then cluster groups whose vectors are similar (cosine similarity, k-means, etc.).
Expectation-maximization (EM): iteratively guess a homophone clustering, decode the message under that clustering, score how “English-like” the result is, and refine.

Once homophones are clustered, their combined frequency matches the original token frequency, and standard frequency analysis works again.

4. Constraint Propagation and Grammar

English has rigid grammatical structure. Once a few words are known, the possibilities for neighboring words are heavily constrained. “the ___ of” can only be a noun. “___ is” is likely a pronoun or short noun. A computer can use language models (even simple n-gram models) to score candidate decodings and prune impossible ones. Modern large language models make this even more powerful — they can evaluate whether a partial decoding “sounds like English” with high accuracy.

5. Multiple Messages

Real codebooks are used for more than one message. Each additional message using the same codebook provides more data for frequency analysis, more context for n-gram analysis, and more opportunities for cribs. Historically, codebook security depended on changing the codebook frequently — but producing and distributing new codebooks was expensive and slow, so in practice books were used for months or years, giving analysts ample material.

6. Null Detection and Filtering

If null (dummy) code groups are present, they appear in the frequency distribution as a cluster of low-frequency groups with nearly identical counts — because each null code is inserted with equal probability. An analyst can identify this cluster statistically: look for a plateau in the frequency tail where 10+ groups all appear the same number of times. Once identified, filtering them out restores the original distribution and all standard techniques apply. Nulls add noise but do not change the fundamental structure of the real token frequencies.

7. Computational Brute Force

If the analyst knows the codebook structure (vocabulary size, group length, whether homophones are used), they can use computers to search the space of possible codebooks. For CODEX specifically, the codebook is generated from a seed string and a set of configuration options. If the configuration is known (or can be guessed), the search space reduces to just the seed — and modern computers can test millions of candidate seeds per second.

The Limit of Pure Frequency Analysis (and Why)

The reconstruction you see in Panel 7 will look surprisingly weak even at 60,000+ groups with no cipher layer and no homophones — lots of unknowns and short runs of guessed letters where you might expect whole words. That is not a bug; it is the structural limit of pure frequency analysis against this codebook design, and it is worth understanding.

The CODEX default codebook only has a few thousand word entries. Every English word not in that book gets spelled out by the encoder — trigram, then digram, then single letter as fallback. Even a literary passage has many such words, so a typical codetext is roughly: ~22% space codes, ~30% individual-letter codes, ~25% digram and trigram codes, ~15% real word codes, ~8% punctuation and control codes. The actual word codes are sparse: even “the” ends up at < 1% of all groups because it is competing with thousands of letter and chunk codes for ranking.

That breaks the textbook “Nth most frequent code = Nth most frequent English word” rank-mapping completely from rank 2 onward. The Decryptor handles this by context-classifying every code first: does the code usually sit alone between two space codes (likely a word code), or does it usually sit inside a run of consecutive non-space codes (likely a chunk of a spelled-out OOV word)? Word codes get rank-mapped against English word frequencies; chunk codes get rank-mapped against English letter frequencies. Chunk runs are wrapped in the reconstruction as one visual unit so they read as a spelled-out word, not as several short adjacent words.

This is much more honest than the rank-only approach, but it is still limited: the codebook's trigram and digram coverage means many chunks are NOT single letters, so the letter-frequency mapping is approximate. To get past this you need the techniques described below — cribs, n-gram context analysis, and language models.

How Long Would It Take?

Against CODEX specifically, a modern analyst with standard tools would likely proceed as follows:

No cipher layer, no homophones: Function words (the / and / of / to…) and the [space] code are recovered quickly from a few thousand groups. Content words spelled letter-by-letter are not well-recovered by frequency alone — the chunk-stream guesses give you approximate letters, but the codebook's chunk coverage skews which letters fall through to single-letter encoding, so partial spellings often look garbled. A real analyst with cribs, n-gram language models, and constraint propagation (described below) would do much better; frequency analysis alone tops out at roughly recovering sentence skeletons.
Cipher layer, no homophones: Strip the cipher first (Vigenère, Caesar, etc. — all are breakable by standard methods), then proceed as above.
No cipher layer, with homophones: Harder, but clustering techniques on a few thousand groups would identify the most common homophones within hours. With 10,000+ groups, most of the codebook is recoverable.
Cipher layer + homophones: The hardest combination. Strip the cipher, then cluster homophones, then do frequency/context analysis. Still breakable with enough ciphertext (tens of thousands of groups) and standard computational tools. Days to weeks of analyst effort, not months.
With null insertion: Nulls add noise but are straightforward to detect. The null codes form a visible plateau in the frequency tail (many groups with identical low counts). Once identified and filtered out, all other techniques proceed normally. Nulls slow analysis slightly but do not fundamentally strengthen the cipher.

The fundamental weakness of any codebook system is that the mapping is fixed: the same plaintext token always produces codes from the same set. No matter how many homophones or cipher layers are added, this structural regularity leaks information with every message. This is why codebook ciphers were progressively abandoned in the 20th century in favor of stream and block ciphers, which produce entirely different output for every encryption even with the same key.

Disclaimer: these pages are educational demos provided as-is, with no warranty of any kind. The author is not responsible for any consequences arising from their use.

Send comments and bug reports to chris@chrisspackman.com.

Version 0.10.4 — Last updated: 2026-07-02

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Chris Spackman's NeoCities Page

CODEX Decryptor

1. Format Detection

2. Cipher Fingerprinting

3. Cipher Layer Analysis

4. Group Frequency Analysis

5. Best-Guess Mapping

6. Context Attack: Word Segmentation & Bigrams