-1

Can any machine learning algorithm predict the function of pseudo random number generator given only its inputs and outputs?

Here, you can also assume that you don't know the seed for the PRNG.

Midlij
  • 1

1 Answers1

1

Yes, this is possible but only within certain cases and depending on the amount of work one is willing to invest. All PRNGs make use of an underlying (deterministic) generative process that can be learned in theory. This blog post nicely illustrates how it could be done. The model learns to predict the next sequence of random numbers without knowing the seed. This makes intuitive sense since deep nets can approximate arbitrary functions and in practice it is often not even clear what the seed value of the generative process would be (e.g. during image classification).

In general, this question has no binary answer but instead depends on a number of factors like the complexity of the PRNGs, the available training data, the network size, and the available compute.

This questions has also been answered here and here.

Mariusmarten
  • 383
  • 1
  • 16
  • I think it is worth mentioning that although theoretically possible, it would be impractical (and AFAIK, never been done) for any modern standard PRNG, even the non-cryptographic ones like Mersenne twister. – Neil Slater Aug 19 '22 at 11:12
  • The blog post actually also has a very interesting [second part](https://research.nccgroup.com/2021/10/18/cracking-random-number-generators-using-machine-learning-part-2-mersenne-twister/) where they do work with the Mersenne twister and 'break it'. – Mariusmarten Aug 19 '22 at 11:16
  • I stand corrected. Partially though, because the blog post approach does require a lot of reverse engineering of the known algorithm in order to apply ML successfully. The answer to the OP's question is still "no" (unless the OP clarifies that knowing the algorithm is allowed). – Neil Slater Aug 19 '22 at 11:22
  • Maybe we can agree on the fact that the quality of our predictions will be a function of the complexity of the PRNGs, the available training data, the network size, and the available compute. I will change the answer accordingly. – Mariusmarten Aug 19 '22 at 11:43