Error Correction and Numerical Masorah

by James Buckland

Error-correcting codes are systems of encryption, compression, or communication that, when modified or distorted by transmission, contain failsafes which allow for the complete or partial reconstruction of the data. Error-detecting codes merely contain a mechanism for the detection of this distortion. The problem of authentication has solved many times over the years — in analog transmission, by seals, and in digital transmission by checksums. However, authentication is a subset of a larger set of problems, known as coding theory.

The first error-detecting code was in 135CE, by Jewish scribes working to produce copies of the Torah. As Spanish/Jewish scholar Maimonidies (better known as Rambam) said in the 12th century,

A Torah scroll missing even one letter is invalid.

Thus it was of critical importance that all new copies of the Torah and its associated writings be identical to the old copies. A system of numerical masorah was developed, containing statistics (the Masorah parva) and applying gematraia to entire pages, in order to provide a hash function. In this way, the accuracy of pages at a time could easily be checked, and thus of chapters, and, eventually, the entire Torah.

Modern equivalents are the checksum, a hash function which applies essentially the same concept to modern digital documents — a method of compressing a large document beyond recognition in a repeatable manner, such that two identical documents would have identical checksums, but even a tiny error would produce an avalanche effect, alerting the reader to an error transcription.

Some coding systems, such as DNA/RNA, use a redundancy method — only the first two bases of a three-base enzyme need to be accurate in order for it to be interpreted properly. However, the massive redundancy of the DNA/RNA system eliminates most of these errors — and those that slip past either become genetic disorders or mutations, which have the added benefit of contributing to evolution.