This first appeared on Sunday, June 24, 2007
The Kalin cipher
Note: The Kalin cipher described below can of course be used in tandem with a public key system, or it can be done by hand with calculators. An appropriate software program for doing the specified operations would be helpful.
The idea of the cipher is to counteract frequency analysis via matrix methods.
Choose a message of n characters and divide n by some integer square. The remainder can be padded out with dummy numbers.
Arrange the message into mxm matrices, as shown,
H I H x
O W A y
R E U z
We then augment the matrix with a fourth column. This column is the first key, which is a random number with no zeros. A second 4x3 matrix is filled with random integers. This is the second key. The keys needn't be random. Easily remembered combinations might do for some types of work because one would have to know the cipher method in order to use guessed keys.
We then matrix multiply the two as MK and put MK into row canonical form.
This results in the nine numbers in M being reduced to three in [I|b], where b is the final column.
8 9 8 x 7 9 2 1 0 0 a
8 15 1 y 5 5 4 0 1 0 b
18 5 21 z 3 2 3 = 0 0 1 c
5 2 3
where (a, b, c) is a set of three rationals.
The keys can be supplied by a pseudorandom number generator in step with a decoder program. A key can vary with each message matrix or remain constant for a period, depending on how complex one wishes to get. But, as said, if one is conducting an operation where it is advantageous for people to remember keys, this method might prove useful.
By the way, if a row of the original message matrix repeats or if one row is a multiple of another, a dummy character is inserted to make sure no row is a multiple of another so that we can obtain the canonical form. Likewise, the key matrix has no repeating rows.
In fact, on occasion other square matrices will not reduce to the I form. A case in point:
a+b b+c a+b
a b c
1 1 1
The determinant of this matrix is 0, meaning the I form can't be obtained.
In general, our software program should check the determinant of the draft message matrix to make sure it is non-zero. If so, a dummy letter should be inserted, or two inconsequential characters should be swapped as long as the meaning isn't lost.
But, if the program doesn't check the determinant it will give a null result for the compression attempt and hence would be instructed to vary the message slightly.
Notice that it is not necessary to transmit I. So the message is condensed into three numbers, tending to defeat frequency analysis.
Here is a simple example, where for my convenience, I have used the smallest square matrix:
We encrypt the word "abba" (Hebrew for "father") thus:
1 2 3
2 1 1
where the first two columns represent "abba" and the third column is the first key.
We now matrix multiply this 2x3 matrix by a 3x2 matrix which contains the second key.
1 2 3 x 2 1 = 10 16
2 1 1 1 3 7 8
2 3
We then apply key 1 (or another key if we like) and reduce to canonical form:
10 16 3
7 8 1
which is uniquely represented by
1 0 -1/4
0 1 11/32
By reversing the operations on (-1/4, 11/32), we can recover the word "abba." If we encode some other four characters, we can similarly reduce them to two numbers. Make a new matrix
-1/4 x 3
11/32 y 1
which we can fold into another two number set (u, v).
We see here an added countermeasure against frequency analysis, provided the message is long enough: gather the condensed column vectors into new matrices.
We can do this repeatedly. All we need do is divide by, in this case 4, and recombine. If we have 42n matrices to begin with, it is possible to transmit the entire message as a set of two numbers, though usually more than one condensed set would be necessary. Of course we can always pad the message so that we have a block of x2n. Still, the numbers tend to grow as the operations are done and may become unwieldy after too many condensations. On the other hand, frequency analysis faces a tough obstacle, I would guess.
So a third key is the number of enfoldments. Three enfoldments would mean three unfoldments. Suppose we use a 4x4 system on 64 characters with three enfoldments.
We write this on 16 matrices. After the transform, we throw away the I matrices and gather the remaining columns sequentially into a set of 4 matrices. We transform again and are left with a single column of four numbers. So if the adversary doesn't know the number of enfoldments, he must try them all, assuming he knows the method. Of course that number may be varied by some automated procedure linked to a public key system.
Just as it is always possible to put an nxn matrix without a zero determinant [and containing no zeros] into canonical form, it is always possible to recover the original matrix from that form. The methods of linear algebra are used.
Decryption is also hampered by the fact that in matrix multiplication AB does not usually equal BA.
The use of canonical form and the "refolding" of the matrices is what makes the Kalin cipher unique, I suppose. When I say unique, I mean that the Kalin cipher has not appeared in the various popular accounts of ciphers.
An additional possibility: when an nxn matrix is folded into an n column vector, we might use some "dummy" numbers to form a different dimension matrix. For example, suppose we end up with a 3-entry column vector. We add a dummy character to that string and form a 2x2 matrix, which can then be compressed into a 2-entry column vector. Of course, the receiver program would have to know that a 2-vector is compressed from a 3-vector. Also the key for the 2x2 matrix is so small that a decryption program could easily try all combinations.
However, if a 100x100 matrix folds into a 10-entry column vector, six dummy characters can be added and a 4x4 matrix can be constructed, leaving a 4-vector, which can again be folded into a 2-vector. All sorts of such systems can be devised.
Additionally, a "comma code" can be used to string the vectors together into one number. The decipherment program would read this bit string to mean a space between vector entries.
Clearly the method of using bases or representations as keys and then transforming offers all sorts of complexities -- perhaps not all useful -- but the emphasis on matrix condensation seems to offer a practical antidote to frequency analysis.
BTW, I have not bothered to decimalize the rational fractions. But presumably one would convert to the decimal equivalent in order to avoid drawing attention to the likelihood that the numbers represent canonical row vectors. And, of course, if one is using base 2, one would convert each rational to a linear digit string. Not all decimal fractions can be converted exactly into binary. However, supposing enough bits are used, the unfolded (deciphered) number will, with high probability, be very close to the correct integer representing the character.
No comments:
Post a Comment