From what I've seen in clustering, distance is taken as a hyper parameter (which is to be selected) when inferring the relationships/clusters between points. Examples of distances I've come across are Euclidean, taxicab, Mahalanorbis, and Minoowski.
What are some examples of highly-cited papers doing this, and is there any concrete empirical evidence supporting this practice? And what non-Euclidean distance functions do they use?