Assume I have an input of size $32 \times 32 \times 3$ and pass it to a convolution layer. Now, if my kernel size were to be $5 \times 5 \times 3$ and the depth of my convolution layer were to be 1, only one feature map would be produced for the image. Here, each neuron would have $5 \times 5 \times 3 = 75$ weights (+1 bias).
If I wanted to calculate multiple feature maps in this layer, say 3, is each local section (in this example, $5 \times 5 \times 3$) of the image looked on by three different neurons and each of their weights trained individually? And what would be the output volume of this layer?