I am completely new to computer vision and I am working on a small hobby project. The goal is to use camera footage of a foosball table to map the image to already well defined object geometry with few moving parts.
To illustrate the exact problem the input could be this:
I am looking to identify all the features of the image and map them to an exact model that could be rendered like so:
It shows the position of the ball and player rods.
I would already have an exact definition of what the image contains like:
- Exact size of the playing field
- Exact positions of player rods relative to the length of the field
- The player sizes and positions on the bar
- Every detail of the table is described already in the model and can not deviate from it
Some limitations:
- The image would always be at an angle. So part of the playing field is cropped and would have to be extrapolated.
- The lighting can vary from one set-up to another.
There are only a few moving parts that need to pinned pointed- the position of the rods (can for now disregard the rotation) and the position of the ball.
It sounds quite complicated but given that I already can tell exactly what I am looking for maybe it can make the task easier.
What would be the most straight-forward approach to solving this problem? Any suggestions or further reading would be greatly appreciated.
My current ideas would be to maybe identify a few anchor points from the image. For example if I would identify the top edge of the playing field - where the green playing pitch ends. And If I were to then be able to identify the position of the top player rod - the goalie rod. Measuring the distance between these lines would allow me to calculate the height of the camera angle because I know the exact spatial relationship between these two features in the pre-defined table model/geometry. With this data I would be able to calculate the positions where all the other features in the image could be located. If I could narrow down areas in the picture that need to be looked at for special features of the table I could maybe add additional constraints on the things that need to be looked at and simplify the task further.
Another idea I had was using the table surface image to identify the table orientation. For instance given the above input picture I might also have a hardcoded reference picture like this:
Can this reference picture be used to find it in the input image in a skewed / cropped way and infer the table position from this overlaying of the two images?