CNN how to measure the amount of FPS that can be processed?

Question

This is my first question in the AI stack exchange. I want to ask about how to measure how many FPS can a CNN model process during real time detection. I am working on a real time detection system using a CNN model where the camera will be taking pictures of cotton flowing down a machine and the software will feed the images to the CNN to detect if there are anomalies such as trash or plastic. If an anomaly is detected, the software will send a signal to remove it. The camera I am using is a USB camera that has FPS range from 30-120 fps depending on the resolution (1080p to 240p respectively).

The CNN model is designed to take in images of size 240x320 pixels to be able to achieve the 120 fps. The CNN model I use has 22 layers consisting of Conv2D layers and a dense layer before the output layer. This is my model architecture.

# Building functions
def myscovblk(out_f,inp_1,inp_2):
  v = layers.Conv2D(out_f,(1,1),strides=1,padding='same', activation=tf.keras.activations.relu)(inp_1)
  #v = layers.Conv2D(out_f,(1,1),strides=1,padding='same', activation='tanh')(inp_1)
  v = layers.MaxPooling2D(pool_size=(2,2),strides=2,padding='same')(v)
  w = layers.Conv2D(out_f,kernel_size=(3,3),strides=(2,2),padding='same',activation = tf.keras.activations.relu)(inp_2)
  #w = layers.Conv2D(out_f,kernel_size=(3,3),strides=(2,2),padding='same',activation = "tanh")(inp_2)
  #w = layers.Conv2D(out_f,kernel_size=(3,3),strides=1,padding='same',activation = tf.keras.activations.relu)(w)
  w = layers.Add()([w,v])
  return w


def bblock(im,xim):
  w = myscovblk(2,im,xim)
  w = myscovblk(3,w,w)
  vw = myscovblk(4,w,w)
  return vw   

def createModel():  
# creates the model
    im_input= layers.Input(shape=[240,320,3]) # makes so the layers know what size input/input shape to expect
    x = layers.Conv2D(3,(1,1),strides=1,padding='same', activation='tanh')(im_input) # 3 layers of convolution
    x = layers.Add()([x,im_input])
    end_feat = []
    for i in range(3):
      h = bblock(im_input,x)
      end_feat.append(h)
    x = layers.Concatenate()(end_feat)

    x = layers.Conv2D(32,kernel_size=(3,3),strides=(2,2),padding='same',activation = tf.keras.activations.relu)(x) #og
    #x = layers.Dense(32, activation = tf.keras.activations.relu)(x)
    #x = layers.Conv2D(16,kernel_size=(3,3),strides=(2,2),padding='same',activation = tf.keras.activations.relu)(x) #og
    x = layers.Dense(16, activation = tf.keras.activations.relu)(x)
    x = layers.Conv2D(1,kernel_size=(1,1),padding='same',strides=1,activation='sigmoid')(x) # output layer
    #x = layers.Dense(1,activation='sigmoid')(x)
    out = layers.GlobalMaxPooling2D()(x)
    model = tf.keras.Model(inputs=im_input, outputs=out, name="custom_hor_grey_fat3_plus")
    model.summary()

    model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.0008,momentum=0.9),
                  loss=tf.keras.losses.BinaryCrossentropy(),
                  metrics=['accuracy'])

I want to know how many FPS can be processed by my CNN model so I can further improve it if possible. Please help me. Thank you in advance.

score 0 · Accepted Answer · answered Aug 14 '23 at 22:32

Well, depends for sure on which device you want to deploy it, if it's a IoT device or some security system backed by a very powerful computer... but theoretically speaking, you just need to take a picture, do the forward pass of the model, which will take in the order of some milliseconds, and then considering that FPS is frame-per-second, you get the FPS by doing a simple division: $$ FPS = \frac{1}{time_\text{forward pass}} $$

CNN how to measure the amount of FPS that can be processed?

1 Answers1