Paul's pages: June 2012

Periodicity 'versus' randomness
A few observations, with no claim to originality

Please also see a more recent post on the probabilities of periodicity
http://kryptograff5.blogspot.com/2013/08/draft-1-please-let-me-know-of-errors.html

We do have a baseline notion of randomness, I suggest. Within constraints, the clicks of a Geiger counter occur, physicists believe, at truly random intervals. An idealized Geiger counter hooked up to an idealized printer might -- assuming some sort of time compression -- print out a noncomputable transcendental. The string's destiny as a noncomputable has probability 1, though, from a human vantage point, once could never be sure the string wasn't destined to be a computable number.

Stephen Wolfram's New Kind of Science, on the other hand, tends to see randomness in terms of pseudorandomness or effective randomness. In my view, an effectively random digit string is one that passes various statistical tests for randomness (most of which are based on the normal approximation of binomial probabilities, which are natural to use with binary strings).

Now, if presented with a finite segment of, say, a binary string, we have no way of knowing, prima facie, whether the string represents a substring of some random string (which we usually expect also to be of finite length). Perhaps the longer string was published by a quantum noise generator. Even so, inspection of the substring may well yield the suspicion that a nonrandom, deterministic process is at work.

Now if one encounters, absent a Turing machine (TM) executor or equivalent, the isolated run 0 0 0 0 0 0 0 0 0 0, one is likely to suspect nonrandomness. Why so? We are doubtless assuming that truly random binary digits occur with a probability of 1/2 and so our belief is that a run of ten zeros is too far from the expectation value of five zeros and five ones. Even so, such a run by itself may simply imply a strong bias for 0, but an otherwise nondeterministic process and so we need to filter out the effect of bias to see whether a process is largely deterministic.

We are much less likely to see the string 0101010101 as anything other than almost surely deterministic, regarding it as a strong candidate for nonrandomness. If we use independence of event (digit on space m) as a criterion, the probability of such a string is 2^(-10), or one chance in 1024 -- the same probability as for any other permutation. Still, such a quantification doesn't tell the whole story. If you predict the sequence 10011110101, you have one chance in 1024 of being right, but if instead you predict some sequence that contains six 0s and four 1s, then you'll find that the relevant set contains 210 strings, yielding a probability that you are correct of 210x2^(-10), or better than 20 percent.

So why do we regard 0101010101 as having a strong probability of nonrandom generation? Because it is part of a small subset of permutations with what I call symmetry. In this case, the symmetry accords with periodicity, or 10 ≡ 0 (mod 2), to be precise.

Is the sequence 100111010 likely to be the result of a random process. A quick inspection leaves one uncertain. This is because this particular string lacks any obvious symmetry. The first sequence is a member of a small subset of length 10 strings that are periodic or "nearly periodic" (having a small non-zero remainder) or that have other symmetries. Many strings, of course, are the result of deterministic processes and yet display no easily detected symmetry. (We might even consider "pseudo-symmetries" in which "periods" are not constant but are polynomial or exponential; some such strings might escape statistical detection of nonrandomness.)

Consider 01001001001

In this case we may regard the string as a candidate for nonrandomness based on its near periodicity of 10 ≡ 1 (mod 3), which includes three full periods.

Consider

0110101101011

The sequence is too short for a runs test, but we may suspect nonrandomness, because we have the period 01101, giving 13 ≡ 3 (mod 5).

We see here that strings of prime length have no periods of form a ≡ 0 (mod c), and hence the subset of "symmetrical" substrings is lower than for a nearby composite. So prime length strings are in general somewhat more likely to look nonrandom.

To get an idea of the smallness of the set of exactly periodic strings, observe that the binary string 101,XXX,XXX has 15 permutations of the last six digits, only one of which yields the periodic string 101101. By similar reasoning, we see that subsets of near-periodic strings are relatively small, as long as the remainder is small with respect to length n. It might be handy to find some particular ratio m/n -- remainder n over string length n -- that one uses to distinguish a union of a periodic and near-periodic subsets, but I have not bothered to do so.

Aside from periodicity and near-periodicity, we have what I call "flanking symmetry," which occurs for any string length. To wit:

0001000

or

0111110

And then we have mirror symmetry (comma included for clarity):

01101,10110

which is equivalent to two complete periods (no remainder) but with the right sequence reversing the order of the left.

We might try to increase the symmetries by, let's say, alternating mirror periods. But note that

0101,1010,0101 is equivalent to 010110,100101

and so there is no gain in what might loosely be called complexity.

Speaking of complexity, what of patterns such that g(n) is assigned to digit 0 and f(n) to digit 1 as a means of determining run lengths? In that case, as is generally true for decryption attacks, length of sample is important in successful detection of nonrandomness.

We may also note the possibility of a climbing function g(0,m) = run length x alternating with a substring of constant length y, each of which is composed ofpsuedorandom (or even random) digits, as in

0,101101,00,111001,000...

In this case, we have required that the pseudorandom strings be bracketed by the digit 1, thus reducing the statistical randomness, of course. And, again, sample size is important.

That is, though 010010001 has a whiff of nonrandomness, when one sees 010010000100001000001, one strongly suspects two functions. To wit, f(x) = k = 1 for runs of 1s and g(x) = next positive integer for runs of 0s. Though a human observer swiftly cognizes the pattern on the second string, a runs test would reveal a z score pointing to nonrandmoness.

So let us use the periodicity test to estimate probability of a deterministic process thus: For n = 20, we have the four aliquot factors, 2, 4, 5, 10. The permutations of length 2 are 00, 01 and their mirrors 11, 10, for 4 strings of period 2. For factor 4, we have 4C2 = 6, yielding, with mirrors, 12 strings of period 4. For factor 5, we have 5C2 =10, yielding, with mirrors, 20 strings of period 5. For factor 10, we have 10C2 = 45, yielding, with mirrors, 90 strings of period 10. So we arrive at 132 periodic strings, which gives a probability of one chance in 138,412,032 if we consider every period to be equiprobable. This concession however may sometimes be doubtful. And, the probability of nonrandomness is also affected by other elements of the set of symmetries discussed above. And of course there are standard tests, such as the runs and chi square tests that must be given due consideration.

Now consider

00000000001111111111

Why does this string strike one as nonrandom? For one thing it is "mirror periodic," with a period of 2. However, one can also discern its apparent nonrandomness using a runs test, which yields a high z score. The advantage of a runs test is that it is often effective on aperiodic strings (though this string doesn't so qualify). Similarly, a goodness of fit test can be used on aperiodic strings to detect the likeliness of human tweaking. And one might, depending on context, apply Benford's law (see http://mathworld.wolfram.com/BenfordsLaw.html ) to certain aperiodic strings.

So it is important to realize that though a small set of symmetrical strings of length n exists whose members are often construed as indicative of nonrandom action, there exists another small set of aperiodic strings of the same length whose members are considered to reveal traits of nonrandom action.

For example, a runs test of sufficiently large n would disclose the nonrandom behavior of Liouville's transcendental number.

Worthy of note is yet another conception of randomness, which is encapsulated by the question: how often does the number 31 appear in an infinite random digit string? That is, if an infinite digit string is formed by a truly random process, then a substring that might occur once in a billion digit spaces, would have probability 1 of recurring infinitely many times.

That is, in base 10, "31" reaches a 95% probability of a first occurrence at the 17,974,385th space. In base 2, "31" is expressed "111111" and a 95% probability of a first occurrence is reached at the 1441th space. Similarly, in an infinite string, we have probability 1 that a run of 10^(100^100) zeros will recur not just one, but infinitely often. Yet if one encountered a relatively short run of, say, 10^10 zeros, one would be convinced of bias and argue that the string doesn't pass statistical randomness tests.

The idea that such strings could occur randomly offends our intuition, which is to say the memory of our frequency ratios based on everyday empirical experience. However, if you were to envisions an infinity of generators of truly random binary strings lined up in parallel with strings stretching from where you stand to the horizon, there is prbability 0 that you happen upon such a stupendous run as you wander along the row of generators. (Obviously, probability 1 or 0 doesn't mean absolute certainty. There is probability 1 that Goldbach's conjecture is true, and yet perhaps there is some case over infinity where it doesn't hold.

Concerning whether "31" recurs infinitely often, one mathematician wrote: "The answer lies in an explicit definition of the term 'infinite random digit string.' Any algorithm for producing the digit string will at once tell you your answer and also raise the objection that the digit string isn't really random. A physical system for producing the digits (say, from output of a device that measures quantum noise) turns this from a mathematical into an experimental question."

Or, perhaps, a philosophical question. We know, according to Cantorian set theory -- as modified by Zermelo, Frankel and Von Neumann (and including the axiom of choice) -- that there is a nondenumerable infinity of noncomputable numbers. So one of these noncomputables, R0 -- that could only be "reached" by some eternal randomization process -- does contain a (countable) infinity of 31s. But this means there is also a number Ra that contains the same infinite string as R0 except for lacking all instances of the substring denoted 31. Of course, one doesn't know with certainty that a randomization process yields a noncomputable number. Ro might be a noncomputable and R1 a computable. Even so, such a procedure might yield a noncomputable R1. So we see that there is some noncomputable number where the string "31" never shows up.

If we regard Zermelo-Frankel set theory as our foundation, it is pretty hard to regard an element of the noncomputables as anything but a random number.

The next question to ponder is whether some pseudorandom aperiodic string mimics randomness well enough so that we could assume probability 1 for an infinity of 31s. Plenty of people believe that the pi decimal extension will eventually yield virtually every finite substring an infinity of times. And yet it is not even known whether this belief falls into the category of undecidable theorems. And even if pi could be shown to be "universal," there is no general way of determining whether an arbitrary program is universal.

No discussion of randomness can readily ignore the topic of Bayesian inference, which, despite a history of controversy, is used by philosophers of science to justify the empirical approach to establishment of scientific truth. Recurrence of well defined events is viewed as weight of evidence, with probability modified as evidence accumulates. Such thinking is the basis of "rules of succession, so that a sequence of, say, five 1s would imply a probability of (m+1)/(m+2) -- 86% -- that the next digit will also be a 1.

In this case, a uniform a priori distribution is assumed, as is dependence. So here, Bayesian inference is conjecturing some deterministic process that yields a high degree of bias, assumptions which impose strong constraints on "background" randomness. (Interestingly, Alan Turing used Bayesian methods in his code-breaking work.)

A longer discussion of Bayesian ideas is found in the appendix of my essay, The Knowledge delusion, found at http://kryptograff5.blogspot.com/2011/11/draft-03-knowledge-delusion-essay-by.html

Paul's pages

Search This Blog

Saturday, June 2, 2012