What is the minimum number of neurons and hidden layers needed to learn a Boolean function that maps $N$ bits to $1$ bit?

Question

Suppose I have a Boolean function that maps $N$ bits to $1$ bit. If I understand correctly, this function will have $2^{2^N}$ possible configurations of its truth table.

What is the minimum number of neurons and hidden layers I would need in order to have a multi-layer perceptron capable of learning all possible truth tables?

I found one set of lecture notes here that suggests that "checkerboard" shaped truth tables (which is something like an N-input XOR function) are hard to learn, but do they represent the worst-case scenario? The notes suggest that such tables require $3\cdot(N-1)$ neurons arranged in $2 \cdot\log_2(N$) hidden layers

score 1 · Answer 1 · 2018-04-12T10:45:16.227

This is a very dicey question. Logic functions can be thought of as mapping multiple inputs to a single output. Now each logic function create its own boundary. So if you are using a complex logical equation it is actually very hard to approximate the underlying function. Here I am treating the input Booleans as the input features.

From practical experience: I had a n-bit = 8-bit input, and it mapped to a single bit output. I used a 2 layer Neural Net and then added a final layer of single node (to have a 0 or 1 output). I was using the sigmoid activation function along with normal back-propagation learning.

Now, I varied the hidden layer nodes from 64 to 1024. The cost decreased, indicative of correct learning but the accuracy did not change, howsoever I tried (changing learning rates, using momentum, etc). It was either giving all 1's or all 0's, even though I was using the same set for training and validation. The reason I hypothesized for this behavior was:

The irrelevant input features are affecting the final output. Even after extensive training and testing on the same set with a huge number of hidden nodes.
Due to the 1 and 0 nature of the input the NN was not able to create a correct boundary. Since if it created an equivalent feature of x^3 using its nodes, it'll just output the same constant value again and again f(0) and f(1), where it should have been a nice little curve

So in short it might or might not work, you never can tell!

What is the minimum number of neurons and hidden layers needed to learn a Boolean function that maps $N$ bits to $1$ bit?

1 Answers1