You should note that both your results are consistent with a "true" probability of 87% accuracy, and your measurement of a difference between these models is not statistically significant. With an 87% accuracy applied at random, then there is approx 14% chance of getting the two extremes of accuracy you have observed by chance if samples are chosen randomly from the target population, and models are different enough make errors effectively at random. This last assertion is usually not true though, so you can relax a little - that is, unless you took different random slices for cross-validation in each case.
100 test cases is not really enough to discern small differences between models. I would suggest using k-fold cross-validation in order to reduce errors in your accuracy and loss estimates.
Also, it is critical to check that the cross-validation split was identical in both cases here. If you have used auto-splitting with a standard tool and not set the appropriate RNG seed, then you may have got a different set each time, and your results are just showing you variance due to the validation split which could completely swamp any differences between the models.
However, assuming the exact same dataset was used each time, and it was representative sample of your target population, then on average you should expect the one with the best metric to have the highest chance of being the best model.
What you should really do is decide which metric to base the choice on in advance of the experiment. The metric should match some business goal for the model.
Now you are trying to choose after the fact, you should go back to the reason you created the model in the first place and see if you can identify the correct metric. It might not be either accuracy or loss.