3

Afaik OSPF ECMP is based, one of the two options, on the destination IP address. Hence if the OSPF advertised route is a /32 basically I wouldn't take advantage of the ECMP. Now the question: in the EVPN-BGP where I used OSPF as underlay routing protocol, there is no advantage from the OSPF ECMP algorithm for the advertised loopback interface of the VTEPs. Normally the NVE interfaces source the address from the loopback. Am I correct?

Somehow I get the answer from here Regarding ECMP hashing, but I wnted to focus on the non-usefulness in the EVPN-BGP scenario, when OSFP is used as underlying routing protocol.

Alex

user2984629
  • 134
  • 8

1 Answers1

4

OSPF does not do the (ECMP based) forwarding. OSPF just provides info that might make it into the routing table and further into to the actual forwarding table. And yes, there can be ECMP for /32 destinations.

In a BGP EVPN scenario, and a given (ingress) leaf switch having ECMP routes to a given remote (egress) leaf's loopback address, the flows between local and remote loopback address will be distributed among the ECMP links as the router's forwarding feature (e.g. Cisco's CEF) sees fit, based on its hashing alogorithm.

With SrcIP (local loopback IP), DstIP (remote loopback IP), protocol number 17 for UDP) and DstPort (probably udp/4789 for VXLAN) being the same for all flows, then only the UDP SrcPort can provide some entropy for the hashing algorithm.

It seems that more than one vendor decided to glean at the VXLAN encapsulated packet to gather some randomness.

Jupiter Networks say this:

https://www.juniper.net/documentation/us/en/software/junos/evpn-vxlan/topics/topic-map/sdn-vxlan.html

The source port field in the UDP header is used to enable ECMP load balancing of the VXLAN traffic in the Layer 3 network. This field is set to a hash of the inner packet fields, which results in a variable that ECMP can use to distinguish between tunnels (flows).

None of the other fields that flow-based ECMP normally uses are suitable for use with VXLANs. All tunnels between the same two VTEPs have the same outer source and destination IP addresses, and the UDP destination port is set to port 4789 by definition. Therefore, none of these fields provide a sufficient way for ECMP to differentiate flows.

Huawei for example say that in VXLAN transport, they're setting the UDP SrcPort from a hash value derived from the encapsulated Ethernet frame:

https://support.huawei.com/enterprise/en/doc/EDOC1100086966

The VXLAN header and the original Ethernet frame are used as UDP data. In the UDP header, the destination port number (VXLAN Port) is fixed at 4789, and the source port number (UDP Src. Port) is calculated using the hash algorithm based on the original Ethernet frame.

And Cisco seems to do the same in some of their products. Quote from a document about the Nexus 3600. Although I struggle to find the same information in the same verbosity for the Nexus 9300 series, I think it's a safe bet to assume that other product ranges do the same:

(emphasis by me)

ECMP and LACP Load Sharing with VXLANs

Encapsulated VXLAN packets are forwarded between VTEPs based on the native forwarding decisions of the transport network. Most data center transport networks are designed and deployed with multiple redundant paths that take advantage of various multipath load-sharing technologies to distribute traffic loads on all available paths.

A typical VXLAN transport network is an IP-routing network that uses the standard IP equal cost multipath (ECMP) to balance the traffic load among multiple best paths. To avoid out-of-sequence packet forwarding, flow-based ECMP is commonly deployed. An ECMP flow is defined by the source and destination IP addresses and optionally, the source and destination TCP or UDP ports in the IP packet header.

All the VXLAN packet flows between a pair of VTEPs have the same outer source and destination IP addresses, and all VTEP devices must use one identical destination UDP port that can be either the Internet Allocated Numbers Authority (IANA)-allocated UDP port 4789 or a customer-configured port. The only variable element in the ECMP flow definition that can differentiate VXLAN flows from the transport network standpoint is the source UDP port. A similar situation for Link Aggregation Control Protocol (LACP) hashing occurs if the resolved egress interface that is based on the routing and ECMP decision is an LACP port channel. LACP uses the VXLAN outer- packet header for link load-share hashing, which results in the source UDP port being the only element that can uniquely identify a VXLAN flow.

In the Cisco Nexus 3600 platform switches implementation of VXLANs, a hash of the inner frame's header is used as the VXLAN source UDP port. As a result, a VXLAN flow can be unique. The IP address and UDP port combination is in its outer header while the packet traverses the underlay transport network.

TL;DR:

ECMP for /32 routes is far from useless in a BGP EVPN scenario, as long as the forwarding engine's hashing feature is provided with a good source of entropy (in extenso: many different and varying UDP SrcPorts).

Vendors seem to derive the UDP Src Port from a hash value calculated from suitably selected header fields of the VXLAN-encapsulated inner packet/frame.

This ensures that all packets belonging to one given "inner flow" are mapped to one single "external VXLAN flow", and ECMP's load-sharing will not balance any single flow across multiple links.

With the many 100s of flows leaving a VTEP towards many (or a single) remote VTEPs, the widely varing UDP SrcPort numbers will ensure that the forwarding engine will load-share traffic quite well over the available uplinks.

Also: While the the routing protocol pushing ECMP routes to the routing table is a prerequisite, the actual hashing, forwarding and load distribution among ECMP links is a question of the router's forwarding feature (CEF in Ciscos), not a question of the routing protocol.

If you're on Cisco Nexus, before to resorting to big things like Netflow, there's some features/tools to look at:

  • show routing hash <SrcIP> <DstIp> ip-proto 17 <SrcPort> <DstPort>, to see where the packets are going, as in NX-OS 9.3: ECMP polarization and "ip load-sharing ... rotate"
  • from the same SE-NE post, be sure to pick up the detail about ip load-sharing ... rotate, and see how it impacts traffic distribution when applied.
  • show interface counters table (available on some Nexus platforms) gives a nice clean list of how traffic is entering or leaving your switch. I'm pretty sure that once you start getting some load, you'll see it distributed quite evenly towards the spines.

Marc 'netztier' Luethi
  • 8,654
  • 1
  • 13
  • 30
  • Thanks Mark for your detailed explanation. From what you wrote, all boils down to knowing how VxLAN flow are built. As you said, entropy in the source port may help, but based on which criteria, the VTEP changes the UDP source port. I could answer this question by myself, but I don't have enough time to setup NetFlow on the underlay, or take captures... – user2984629 May 22 '23 at 11:53
  • MIght be difficult to find out exactly which kind of hashing algorithm will digest which parts of the encapsulated frame; I wouldn't expect that vendors to document this in detaul. I added a few bits to my post suggesting how to analyze before heading towards big guns like a full blown Netflow setup. – Marc 'netztier' Luethi May 22 '23 at 19:35
  • This "difficulty" about knowing how often the VxLAN source port changes is the keypoint. What I stated at the opening of the discussion was based on the fact that I supposed it would not change over the time. – user2984629 May 23 '23 at 12:04
  • 1
    Even in the situation of a single local (ingress) VTEP sending many hundreds of flows to a single remote (egress) VTEP, there won't be a **single** UDP SrcPort that could "change over time"; there will be *many*! The ingress VTEP's hashing feature will understand what to glean from the encapsulated packet and how to hash it into a SrcPort number while maintaining the concept of a "flow". Result: Hundreds, possibly thousands of different UDP Src Ports in use, bewteen the same two VTEPs. The underlying ECMP capable fwding engine can then nicely hash with a lot of entropy from the UDP SrcPort. – Marc 'netztier' Luethi May 23 '23 at 15:01
  • Hi Marc, I'm sure it's me who doesn't see the explanation in the webpage whose link you sent me. Would you be so kind d to point me to the key-sentence of that article, please? TIA! – user2984629 May 24 '23 at 07:15
  • 1
    I updated the answer with some more sources and emphasis. – Marc 'netztier' Luethi May 25 '23 at 16:14