Precision is the number of true positives over the number of predicted positives(PP), and recall is the number of true positives(TP) over the number of actual positives(AP) you get. I used the initials just to make it easier ahead.
A true positive is when you predict a car in a place and there is a car in that place.
A predicted positive is every car you predict, being right or wrong does not matter.
A actual positive is every car that actually is in the picture.
You should calculate these values separately for each category, and then sum over the examples you sampled, if I am not mistaken.
So for the CAR category you have (assuming the predictions do match with the target, i.e., you are not predicting a truck as a car for example) :
model 1
line 1 -> 2 TP, 2 PP, 4 AP
line 2 -> 0 TP, 0 PP, 2 AP
So in total precision is 2/2 = 1
and recall is 2/6 = 0.3(3)
.
You can then do the same for the other categories, and for the other models. This way you can check if a model is predicting one category better than the other. For example, model 1 can be better at finding cars in a picture whilst model 3 can be better at finding buses.
The important part is that you know if the objects the model predicted actually correspond to what is in the picture. A very unlikely example would be a picture with 1 car and 1 truck where the algorithm recognizes the car as a truck and the truck as a car. From the info that is in the table I cannot be sure if the 2 cars you predict are the actual cars in the picture, or in other words, if they are actually True Positives or are actually False Positives.