OpenCV does include 2D filter convolution functions for custom separable and non-separable filters. The latter uses DFT for large filters, which may or may not be faster than the conventional method. It also includes (partial?) support for deep nets with various types of layers. Theoretically, you should be able to stitch everything together into a complete CNN. However, I have not used any of those, and I have no idea about the level of maturity of the implementation.
That said, if you are willing to implement a custom CNN from scratch, you will probably get more control over the implementation using a generic (BLAS / OpenCL / CUDA) matrix library.