1

The feedback given by humans to align artificial intelligence is limited by the reaction time and processing speed of the finite number of us, now less than $2^{33}$. As an artificial intelligence (or a growing number of them) grows in complexity, more and more possible actions need to be checked for human compatibility. But human feedback cannot grow at the pace of that expanding machine, therefore necessarily weakening the coupling between natural and artificial intelligence.

Does this make alignment impossible in practice?

  • [A narrow model aligns the bigger model?](https://ai.stackexchange.com/q/39158/43559) – Vepir Apr 08 '23 at 23:12
  • Then the bigger model cannot develop the ability to perform any new action whose fitness is not evaluated beforehand. – Jaume Oliver Lafont Apr 09 '23 at 03:15
  • 1
    In other words, our values exist -if at all- with respect to current options. We do not have preferences yet for future situations made possible by future systems. And the speed of appearance of those new scenarios will be faster than our speed for providing any meaningful feedback. – Jaume Oliver Lafont Apr 09 '23 at 03:19
  • 1
    In particular, are we ready for AI to beat us at the language game too, after having been surpassed at chess and go? – Jaume Oliver Lafont Apr 09 '23 at 03:29
  • Indeed it has the potential to have intelligence explosion out of control by original design, thus it needs to have some strict formal proof for the safety of self-growth or modification (like a self-replicated cellular automaton in Conway's Games of Life) limiting any bad effect, ad hoc engineering is perhaps dangerous here. – mohottnad Apr 10 '23 at 05:16
  • I cannot imagine but unsolvable problems. Here is one. The action of an AI system leads humans to cease direct human-human interaction. A fraction of humans is ok with that and a fraction is not. But the AI is stronger and has the effect actually implemented in reality. Even worse, that halting is an unexpected side effect. How do formal proofs deal with such questions? – Jaume Oliver Lafont Apr 10 '23 at 05:26
  • That seems a billion dollar question since the original AI needs to prove something about a future more intelligent evolution of self, like a child needs to prove the safety of adult version while at the same time no idea about all the possible versions exhaustively and even whether enumerable. This intuitively needs some perfectly reliable self-reflection. I’d suggest first try to formalize the utility of non-halting human-human interaction and prove it’s better than human-machine interaction based on some universally accepted axioms even for your two fractions of people described above. – mohottnad Apr 10 '23 at 23:27

0 Answers0