0

Is this very common for larger enterprises that have many thousands of servers to use something like ant colony or similar algorithms to work together with network routing protocols like BGP to solve Traveling Salesman Problem (TSP) to optimize their routing globally throughout their data centers?

It's very common for data centers to have a level of oversubscription. Although it was surprising to see that Facebook has their network topology without oversubscription. Would TSP optimization still apply when there is no oversubscription by design? I guess it's still needed when one of the spine switches gets lost and there is again some level of oversubscription.

Interesting quote from http://highscalability.com/blog/2015/8/10/how-google-invented-an-amazing-datacenter-network-only-they.html

A typical network today (not necessarily Google) 
may have 10K+ switches, 250K+ links, 10M+ routing rules.

It all may get even more complicated with VPCs in play, limitations of networking of edge locations / last mile etc. Sorry if my question is a bit too broad or vague, I am looking for more of a research / high level understanding how these bigger companies deal with such larger network optimization challenges?

Tagar
  • 101
  • 3

1 Answers1

2

Your question is indeed vague and since it will generate mostly opinions, is also probably off topic.

I did want to address one point. The "Traveling Salesman Problem" isn't really applicable to data center topologies, whether it's just a handful of racks or a Facebook-sized organization. Data doesn't need to pass through every node. It only has to get from source to destination. Traffic flows from server to server, or from server to Internet. In either case, it's only 3 or 4 hops from one server to another or to the Internet edge. In the data center, the fabric design limits the number of hops and finding the shortest path is trivial.

Ron Trunk
  • 66,852
  • 5
  • 65
  • 126
  • Thank you Ron. I understand one route at a time optimization doesn't fall into TSP realm. I am trying to look at this holistically - looking for global optimization and not local optimizations, for example, given network traffic patterns, how often certain network segments/ ToR switches, VLANs, edge locations and outside networks, BGP peers, etc or even specific server talk to each other.. – Tagar Feb 13 '20 at 22:41
  • 1
    @Tagar Most large organizations use Netflow to gather the exact data you're looking for. – Ron Trunk Feb 14 '20 at 14:11
  • 1
    Also, on the Internet, routing has little to do with "optimizing" paths. Routing is determined by contracts (I agree to carry your traffic but not his), or by financial considerations. – Ron Trunk Feb 14 '20 at 14:14
  • I worked for telecom companies for many years, so have used Netflow in the past extensively.. that's not the answer I was looking for though but that's okay since I wasn't very specific :-) Thanks Ron – Tagar Feb 14 '20 at 17:24