Mathematics. Random numbers and random generators. History

Mathematics. Random numbers and random generators. History

"When I set myself the task of obtaining a truly random number, I found nothing better than a common die," Francis Galton wrote in Nature in 1890. "When the dice are shaken and thrown into a basket, they strike each other and the sides of the basket in such a way that even after a slight throw it becomes quite impossible to determine the result." How can we generate a uniform sequence of random numbers? Randomness, so beautiful in its diversity, is often found in living Nature, but it has not always been easy to extract artificially. The oldest dice were found in the Middle East, dating back to about 24 centuries BC. Another example is China, where, as early as the 11th century BC, they used to break the tortoise shell with a blow, with the subsequent interpretation of the size of the resulting random pieces. Centuries later, people would cut plant stems several times and compare the sizes of the resulting pieces.

But by the mid-1940s, humanity needed significantly more random numbers than could be obtained by throwing dice or cutting stems. The American think tank RAND created a machine that could generate random numbers using a random pulse generator (something like an electronic roulette wheel). They launched it and, after some time, received enough random numbers, which they published in the book "One Million Random Numbers and One Hundred Thousand Normal Deviations."

Another similar machine, ERNIE, built at the now famous Bletchley Park in the 1940s, was used to generate random numbers in the British Premium Bond lottery. To dispel fears about the randomness and fairness of the results, a documentary film, "The Importance of Being E.R.N.I.E.," was even filmed. Today, you can watch it on YouTube:

In 1951, randomness was finally formalized and implemented in a real computer, the Ferranti Mark 1, which came with a built-in random data generator based on shot noise and an instruction to generate 20 random bits. Alan Turing developed this functionality. His colleague Christopher Strachey used it to create a "love letter generator". Here is an example of the text that was generated by this program:

But Turing's random number generator did not make the programmers of those years happy, since it introduced an even greater instability factor into the already new and unstable computer systems. If we wanted to achieve stable operation of some program without debuggers and with random data, it would never be possible to achieve repeatability of results. Then the following thought arose: "What if the random number generator can be represented as some deterministic function?" Then, on the one hand, the generated numbers would have the properties of random numbers, but on the other hand, with the same initialization of the generator, these sequences would be the same every time. This is how pseudorandom number generators (PRNGs) appeared.

John von Neumann developed the PRNG in 1946. His idea was to start with a number, take its square, discard the digits from the middle of the result, take the square again and discard the middle, etc. In his opinion, the resulting sequence had the same properties as random numbers. Von Neumann's theory did not stand the test of time, since no matter what initial number you chose, the sequence generated in this way degenerated into a short cycle of repeating values, like 8100, 6100, 4100, 8100, 6100, 4100,…

It is impossible to completely avoid cycles when we work with a deterministic function whose subsequent values are based on the previous ones. But what if the cycle period is so long that it doesn't matter in practice?

Random numbers

Mathematician Derrick Henry Lehmer took this idea further in 1949 with the invention of the linear congruential method. You'll notice several "magic constants" in the code. These numbers (usually primes) were chosen to maximize the cycle period before the results were repeated. This generator starts with the current time and lasts about 2^31. It became popular with JavaScript 1.0 because it didn't yet have a built-in Math.random() function, and everyone wanted their ad banners to change randomly. "I wouldn't use this algorithm for anything security-related, but for many applications it's good enough," wrote Paul Houle, the code author above.

But the Internet was crying out for something security-related. SSL came out in 1995, and its implementation required a high-quality pseudorandom number generator. This led to a series of rather wild experiments in this area. If you look at the patents related to random number generation issued at the time, you get the feeling that you are looking at modern attempts to build the first airplane.

Most popular processors in the 90s did not have a special instruction for generating random numbers, so choosing a good seed for the pseudorandom number generator was very important. This resulted in security problems when Phillip Hallam-Baker discovered that the SSL implementation of the Netscape browser (the dominant browser) combined the current time and its process ID to initialize the PRNG. He showed how a hacker could easily guess this value and decrypt SSL traffic. Guessing the seed of a pseudorandom number generator is still a popular attack, although it has become slightly more sophisticated over the years. Here's an example of a successful attack, published on Hacker News in 2009.

In 1997, programmers were limited in how they could generate truly random numbers, so the SGI team created LavaRand, consisting of a webcam pointed at a pair of lava lamps placed on a table next to each other. The image from this camera was an excellent source of entropy - a Real Random Number Generator, just like Turing's. But this time, it was able to generate 165 KB of random numbers per second. The invention was immediately patented.

Most experiments in this area have not stood the test of time, but one PRNG, the Mersenne Twister, developed in 1997 by Makoto Matsumoto and Takuji Nishimura, managed to hold its own. It combined performance and quality of results, allowing it to spread quickly to many languages and frameworks. The Mersenne Twister is a generalized-return shift register that generates a deterministic sequence with a huge cycle period. The current implementation has a period of 2¹⁹⁹³⁷− 1, which is included in most programming languages today.

In 1999, Intel added a hardware random number generator to the i810 chipset. This was good because it produced truly random numbers (based on thermal noise), but it was not as fast as software PRNGs. So, most crypto applications still use PRNGs, which can now at least be seeded with a hardware random number generator.

This brings us to the concept of a cryptographically secure pseudorandom number generator (CSPRNG). (Oh, those acronyms! No wonder some people find computer science boring.) KBPRNGs have become very popular in the SSL era. What requirements should a KBPRNG satisfy? Well, here's a 131-page document with these requirements. Have fun. As you already understand, there are quite a few requirements. For example, the same Mersenne Twister does not satisfy them, since, given a sufficiently large sequence of numbers generated by it, you can guess the next number.

In 2012, INTEL added RDRAND and RDSEED instructions to its chips, allowing you to get real random numbers based on the same temperature fluctuations, but now at a speed of up to 500 MB/s, which should make software PRNGs unnecessary. However, rumors and doubts about the honesty of these instructions circulate in society. Is there a specially made backdoor for the NSA? No one can say for sure. Someone probably can, but they certainly won't tweet about it.

In recent years, hardware random number generators from third-party manufacturers have also begun to appear, and even completely open (regarding software and hardware). The strength of these products is precisely in their openness: you can study them yourself and even build them at home from publicly available components. Examples include REDOUBLER and Infinite Noise TRNG.

Today, people are still debating which random number generator should be used in a particular system, OS kernel, programming language, cryptographic library, etc. There are many options for algorithms focused on speed, memory savings, and security. And each of them is constantly subject to attacks by hackers and security specialists. But for everyday non-security tasks, you can rely on /dev/random or the rand() function, which is available on most platforms. This will give you a large enough random data sequence to make Alan Turing happy.

Articles, facts, galleries

Mathematics. Random numbers and random generators. History

QUOTES, SAYINGS: