I want to build a classifier which takes an aerial image and outputs a bitmap. The bitmap is supposed to be 1 at every pixel where the aerial image has water. For this process I want to use a ConvNet but I am unsure about the output layer. I identified two approaches:
- Have an output layer with exactly 2 nodes which specify wether or not the center pixel of the aerial image corresponds to water or not.
- Have an output layer with one node for every pixel. So for a 64x64 image I would have 4096 nodes.
What approach would be preferred and why?
Another thing that is unclear to me is how to get the actual bitmap with only zeros and ones from the output of the ConvNet. Assuming we used a approach 2 then for each pixel our ConvNet would give us a probability between 0 and 1 that the this pixel corresponds to water. How do I decide that this probability is high enough to set the value in my bitmap to 1? Do I just define a threshold, say 0.5, and if the value exceeds that threshold I set the pixel to 1 or is there a more sophisticated approach?