You can find an explanation here (github of the googleapi):
My current understanding of a color's score is a combination of two things:
- What is the focus of the image?
- What is the color of that focus?
For example, given the following image:

The focus is clearly the cat, and therefore the color annotation for this image with the highest score (0.15) will be RGB = (232, 183, 135) which is the beige color:

The green of the grass (despite having more pixels in the image dedicated to it) has a much lower score by virtue of the algorithm's detection that it's the background and not the focus of the image.
In other words, higher "scores" means higher confidence that the color in question is prominent in the central focus of the image.
It is analogous to your case. Therefore, using a background removal this can help to find the focus of the image, and then make the histogram of the remaining objects' color. An example of the background removal using deep learning can be found in this post.