1

In my problem, there are about 5,000 training images and there are about 50~100 objects of identical type (or class) on average, per image. And for each training images, there is a partial mask information that denotes the polygon vertices of objects, but the problem is there are only 3 ~ 5 objects per image with mask/annotation information.

So in summary there is 1 class, 5,000 * 50 ~ 5,000 * 100 instances of the class, and 5,000 * 3 ~ 5,000 * 5 instances with masking information.

So not a single training data image has a full masking information, and yet all the training data images have partial masking information. My job is to make instance segmentation model.

I did some search on semi-supervised segmentation, and to my understanding it seems like the papers are tackling problems where some training images have all the objects annotated while the other training images have 0 objects with annotation. That isn't exactly my situation. How should I approach this problem? Any tips are appreciated.

  • If you cannot share the actual data, you could help us by writing a python code which generates such data with similar features, the percentage of given correct labels etc. Actually you could even train your own network with such data, and use it as a basis for labeling the real dataset. – NikoNyrh Feb 08 '22 at 14:51
  • 1
    Welcome to AI. If it were me then I would use the annotations to train a classifier to answer "would you find one of these under an annotated mask?" and then use it more broadly for semi-supervised outside of the mask. – EngrStudent Feb 09 '22 at 18:49

2 Answers2

1

If you are trying to make instance segmentation model, if you are trying to get full masking objects in your image then you are going into wrong direction.

Generally, when you train the model with partial mask your predictions will also be partial masked.

Can you please list your input image sample and output expected sample so that I can help you with which type of instance segmentation you need to train.Also try to list which instance segmentation models are in your mind to train..

yogi
  • 11
  • 2
  • The images are grayscale and they look very similar in human eyes. I will upload a sample once I am allowed to do so. Just say white is object and dark is background and you won't be too wrong. But there are noises and simply getting pixels within certain range of luminance yields about 70% accurate segmentation. I tried unsupervised learning ignoring the given masks, and then fine tuning with the masks. It didn't go well. Currently trying Unet with preprocessed images(smaller pictures of area around given polygon masks) – jeff pentagon Feb 08 '22 at 09:56
  • 1
    If i understand correctly, you are trying to solve semantic segmentation task, rather than instance segmentation. Yes, Unet is good starting path to test your results. More more precise results you can try Detectron2 model which have semantic segmentation model and you try train on different backbones to try your experiments on – yogi Feb 08 '22 at 10:12
  • It is instance segmentation task because I need to know how many instances are in the image as well as the segmentation mask. but I decided I just do semantic segmentation first and do the counting job with object detection algorithm or something. Only chose Unet because my images look similar to microscopy images and UNet is a popular solution in that field(they are not microscopy but looks kind of similar) Would have been easier if there was a popular method for instance segmentation for grayscale though. – jeff pentagon Feb 08 '22 at 10:59
1

If you have up-to 100 objects in a single image, their size must be fairly small percentage-wise. let's say their diameter is 50 pixels and the whole image is 1024 x 1024. Are the objects always non-overlapping, thus somewhat far apart? If so, you could label the known objects with a mask value of 1, their nearby pixels are set to a mask value of 0, and everything else is set to 0.5. Let's call this "Mask A".

Now you construct a normal CNN which maps a 1024 x 1024 x 1 image to a mask of the same size. But the network takes a second input as well: "Mask B". This is calculated as Mask B = (Mask A != 0.5), meaning that indicates to which pixels we know the correct answer (either 0 or 1). The network's output is modified so that it outputs the normal values where Mask B is true, and 0.5 to pixels where Mask B is false. The binary cross-entropy is calculated against the Mask A which has values 0, 0.5 and 1.

Note that regardless of the model's parameters, it will always output a value of 0.5 to those pixels which are also 0.5 at Mask A. Thus they won't contribute to the loss or its derivatives.

I did a simple network in Keras and trained it with simulated data:

inputs = [layers.Input((res,res,1)) for _ in range(2)]

x = inputs[0]

for dim in [8, 16, 8, 4, 1]:
    x = BN()(layers.Conv2D(dim, 7, activation='elu', padding='same')(x))

x = layers.Conv2D(1, 11, padding='same', activation='tanh')(x)
x = (x * inputs[1] + 1) * 0.5

model = keras.Model(inputs, x)

Note that tanh was used instead of sigmoid, since it made the "hard-code unknown pixels as 0.5" a bit easier. The generated dataset consists of grayscale images (shown left), partially labeled data (yellow = known circle, dark blue = known non-circle and blue-green for unlabeled pixels), the network's output and the ground-truth. The network is able to find all the circles quite well, but there are some false positives and the border seems to be labeled all ones.

circle detection

But this approach cannot be used if the objects could overlap, and we cannot generate a sufficiently large "neighbourhood region" around it with known labels of "no object".

NikoNyrh
  • 642
  • 4
  • 8
  • 1
    Unfortunately they overlap like crazy but good answer nonetheless. I will give a more detailed view on dataset and the task once I am done with this task. So far I got decent but not superb result and I would probably end as is, but I'd love to learn more even after the task due date is over – jeff pentagon Feb 08 '22 at 22:27