As English is not my native language, I have some hard time understanding the following sentence:
Regardless of the size or aspect ratio of the candidate region, we warp all pixels in a tight bounding box around it to the required size. Prior to warping, we dilate the tight bounding box so that at the warped size there are exactly p pixels of warped image context around the original box (we use p = 16).
This is from the R-CNN paper. I already extracted the ROI, but now, they say that the input of the CNN should be 227 x 227, but a lot of my ROIs are much smaller. How can I deal with it?