5

I want to build a classifier which takes an aerial image and outputs a bitmap. The bitmap is supposed to be 1 at every pixel where the aerial image has water. For this process I want to use a ConvNet but I am unsure about the output layer. I identified two approaches:

  1. Have an output layer with exactly 2 nodes which specify wether or not the center pixel of the aerial image corresponds to water or not.
  2. Have an output layer with one node for every pixel. So for a 64x64 image I would have 4096 nodes.

What approach would be preferred and why?

Another thing that is unclear to me is how to get the actual bitmap with only zeros and ones from the output of the ConvNet. Assuming we used a approach 2 then for each pixel our ConvNet would give us a probability between 0 and 1 that the this pixel corresponds to water. How do I decide that this probability is high enough to set the value in my bitmap to 1? Do I just define a threshold, say 0.5, and if the value exceeds that threshold I set the pixel to 1 or is there a more sophisticated approach?

treigerm
  • 53
  • 2

1 Answers1

3

Your first approach doesn't make any sense to me. After all, you are not just interested in the centre pixel are you? And if you have two nodes for every pixel, what are those two nodes encoding? For the probability of water you just need one.

So clearly approach two. I would just use 0.5 as cutoff. Using a higher or lower cutoff only makes sense, if either false positives or false negatives are for some reason more problematic. Using some additional heuristic to adjust the given probabilities would just do what the ConvNet should already have done.

If 4096 output nodes is too expensive you can always make it a bit fuzzier, for example by predicting the probability that at least one of four pixels shows water, reducing the number of nodes to 1024.

BlindKungFuMaster
  • 4,185
  • 11
  • 23
  • Thanks! Yes, approach 1 seems a bit unusual to me as well but I saw it in [an example](https://github.com/trailbehind/DeepOSM) that does something similar to what I want to do so I thought it might be a valid option. In that example they wanted to predict ways and had labels either [1,0] or [0,1], encoding wether the image contains a way in the center or not, respectively. – treigerm Jan 06 '17 at 13:18