Redundant IP link aggregation for failover operation without route failure detection

Question

I am looking for a technology to achieve TCP connection fault tolerance with the help of two links between hosts and without time delays for route failure detection. Something like this:

                       link1   packet1copy1->
                     --------------------------
      packet1->     /                          \    packet1copy1/packet1copy2->
host1--------router1                            router2 ------------------------host2
                    \  link2   packet1copy2->  /
                     --------------------------

host1 and host2 are connected via router1 and router2 with two links between them. Each router duplicates every packet coming from hosts before forwarding them into both links simultaneously. Then either the peer router or the destination host IP stack take care of redundant packet elimination.

Edit: This is in fact a search for a general purpose fault-tolerance-by-replication solution for TCP (IP) transport. The solution should be of no-need-to-recover type as opposed to reasonably-fast-to-recover approaches like BGP / OSPF / Cisco IP SLA, etc. Some proprietary packet redundancy solutions are known already, though insufficiently universal. In particular, Engage Communication offers IP Tube Protector for VoIP. Unfortunately this solution 1) is more of equipment than of standard technology and 2) is confined to VoIP domain only. It may be also worth noting Juniper Packet Redundancy technology, though it looks like limited to single link only, and not to redundant links.

I wonder why I can't find anything similar from Cisco... Does any standard or at least general purpose technology address this?

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/117992/discussion-on-question-by-sergey-ushakov-redundant-ip-link-aggregation-for-failo). — Ron Maupin, Jan 03 '21 at 21:49

Mike Pennington · Answer 1 · 2013-07-23T02:03:21.060

10

I am looking for a technology to achieve TCP connection fault tolerance with the help of two links between hosts and without time delays for route failure detection. Something like this:

                       link1   packet1copy1->
                     --------------------------
      packet1->     /                          \    packet1copy1/packet1copy2->
host1--------router1                            router2 ------------------------host2
                    \  link2   packet1copy2->  /
                     --------------------------

There are a few things working against your proposal...

You are going to make host1 and host2 work very hard to untangle your intentional packet duplication scheme for no good reason
You are burning horsepower on your ipsec encryption points for no good reason
TCP has been refined for over three decades to automatically recover from infrastructure flaws and failures; "helping" TCP in such a way fixes the wrong problem. You need to make your infrastructure react to mitigate problems, you should not duct-tape TCP to survive problematic infrastructure.

I am going to answer with the the same comment I made, since your failure detection requirements are twenty seconds...

Build 2 IPSec tunnels with ISP diversity as required. Run a routing protocol through your IPSec tunnels, and tune the protocol timers to fail around sustained infrastructure packet loss. If you have Cisco end to end, EIGRP has long had very fast convergence around failures, although Link state protocols are getting the same these days with the IETF loop free alternative implementations.

Optionally use IP SLA on both sides to take down a tunnel that does not meet any jitter / delay / packet loss requirements.

edited Jul 23 '13 at 02:03

answered Jul 22 '13 at 12:00

Mike Pennington

29,876
11
78
152

Mike, with all due respect I cant accept your criticism for the following reasons: 1) my question seeks a _fault tolerance by replication_ type of solution, while your solutions are of _fault tolerance by redundancy_ type; both approaches are normally deemed valid, yet they tend to yield different quality of service levels, and I seek for better level of service; 2) fault tolerance by replication tends to be more expensive, but I would not take the word "expensive" too serious here :) that said, please accept my "thank you" and upvote for good overview, yet I refrain from accepting your answer – Sergey Ushakov Jul 23 '13 at 05:02
2

@s-n-ushakov, like I said... if you want fault tolerance by replication, you are using the wrong protocol. TCP was made for fault tolerance by redundancy. If you want fault tolerance by replication, may I introduce you to our friend known as [UDP](http://tools.ietf.org/html/rfc768). UDP is much better-suited for what you want; however, that means you're about to rewrite your primary business application just because you are in love with a strange network design (with no known hardware to implement this bidirectional packet replication, I might add) – Mike Pennington Jul 23 '13 at 11:06
well, sometimes the application-level protocol is not our choice... and knowledge of your peer infrastructure may be limited in business world... and it might be cool to have, say, HTTP designed and implemented over UDP :) and speaking seriously, thank you for pointing to link state protocols, they may be a relief, though not the final solution; BTW TCP itself has already provision at least for a part of solution being sought: _The TCP must recover from data that is ... duplicated ..._ -- RFC 793, section 1.5, subsection "Reliability" – Sergey Ushakov Jul 24 '13 at 01:37
6

Feel free to quote RFC 793, Section 1.5... in response, I will quote [RFC 1925, Section (3)](http://tools.ietf.org/html/rfc1925): `With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea.` – Mike Pennington Jul 24 '13 at 02:05
:)) thank you for pointing this out, never heard of it... but my upvote for your comment is for the quote itself, not for the rationale behind :) can't see why TCP can't be improved by replication... folks out there [mention possibility of redundant packet forwarding](http://stackoverflow.com/a/12871776/972463), but I am unfortunately little of practicing networking expert and don't know how to achieve this... – Sergey Ushakov Jul 24 '13 at 03:16
@s-n-ushakov, let's suppose we're interested in UFOs. If someone mentions the possibility of UFOs, that doesn't suddenly make a UFO exist anywhere other than in his mind. I been in networking for 20 years; however, I never seen a router that forwards IP packets down more than one next-hop link at once. – Mike Pennington Jul 25 '13 at 10:09
Well, it's never too late to learn something new ;) You are now welcome for an UFO observation at [Engage Communication](http://www.engageinc.com/downloads/IPTube_Protector_White.pdf) and to have a look at some more rationale in my latest edit of this question... – Sergey Ushakov Jul 26 '13 at 09:50
well, and what is your opinion on the Engage Communication solution? – Sergey Ushakov Jul 26 '13 at 12:15
2

Engage Communications is selling a TDM over IP solution. You are asking for a TCP over IP solution... you could overlay IP over TDM over IP, but again... this is really crazy. You should hire a real network engineer – Mike Pennington Jul 26 '13 at 12:24
:) Well, I did not mean to ask whether the product from Engage Communications is good for use as a general purpose IP router :) It's fair enough it is not. But I am curious to know your opinion if their solution is sufficient proof of the concept of fault tolerance by IP packet replication over redundant routes. – Sergey Ushakov Jul 26 '13 at 13:04
@s-n-ushakov, I already suggested that IP redundancy is a possible solution with a UDP service. You have a business application written on TCP... your original question is asking about a real solution to an infrastructure problem involving TCP timeouts. – Mike Pennington Jul 26 '13 at 13:35

jwbensley · Answer 2 · 2013-07-28T11:49:34.713

OK, from the top;

Down vote on your question from me; your question isn't clear enough based on your responses in comments to other peoples answer. You have assumed the solution is networking engineering related but you don't seem to know, and give the impression that you hope someone is going to give you the answer you need.
You have the following problem requirement;

host1 and host2 are connected via router1 and router2 with two links between them. Each router duplicates every packet coming from hosts before forwarding them into both links simultaneously. Then either the peer router or the destination host IP stack take care of redundant packet elimination.

Unless your end host's connection to their local router is double the speed of the traffic going over a single link between router1 and router2, which you haven't mentioned, your hosts will need two connections to their local router. There is NO native software or product ANYWHERE that can run on an end hosts and take two TCP streams down the same NIC or two separate ones and pull from an alternate stream missing packets from the first stream. How do I know this? Because that is not how networking works, IP & TCP simply weren't designed to work like that. There maybe products for duplicating packets, but these are a niche, not wide spread, becasue it's the wrong answer to the question.

Why is this such a bonkers request;

You seem to be trying to put a round peg into a square hole. My understanding of your problem requirement is that you want redundancy for your application's data travelling between to remote hosts. Data is sent twice end to end in case of a link failure. That is all you are protecting against here with dual TCP flows though, physical layer 1 failure. If there is a pause in sending a packet from one host to the other, it will be late arriving down both router-to-router links. If a transient problem occurs on one link but not the other, such as congestion, the router at the end of the link, would need to track both TCP streams simultaneously to see that when a packet arrives on link2 with the proceeding sequence number in its header, and nothing has arrived on link1, then the packet on link1 is late, and if it does turn up, it needs to drop it.

What if you find your self in a situation where there is congestion on link1 but no traffic is dropped, due to a good QoS schema, but it is queues, packets down link1 are now always behind link2. What if link2 fails now and the router passes packets on link1 to the end hosts, it's going to receive dup packets, and stop and retransmit etc, and cause a delay. Nothing was achieved here.

Moving on to a solution;

A better idea in my opinion would be to have dual layer 2 links between the two end hosts, extending their broadcast domains to include each others NIC. You can do this via direct layer 2 interconnects, MPLS/VPLS extension, carrier layer 2 service, take you pick, that isn't strictly relevant here. Extending the layer 2 network between hosts means you don't need to mess with TCP or do any crazy black magic or band-aid type fixes. TCP will be completely agnostic of the underlying technology and you will still have you layer 1 / physical link redundancy.
If you use an MPLS based solution you can use features like traffic engineering (MPLS-TE) to monitor the latency across the links and always use the link with the lowest latency. You can use BFD with MPLS FRR, which can get you 50ms~ fail over time between links. I know you said you don't want a redundancy fail over solution, but 50ms is pretty fast in my opinion. If your application can't handle a 50ms loss of connectivity, then you need to go back to the application drawing board. No system is up 100% of the time, you must plan for failures, planned maintenance, and outages through malicious intent / security related; to all occur at some point. You must be realistic.

In one comment you said the following;

well, IP SLA is the technology being used at least at one end so far... :) still it takes quite a time for both ends to detect link failure, and the application gets out of sync sometimes... and the links may be twinkling sometimes... that's why we are looking for something delay-free

No such thing; Time must pass for possible events to become actualities. You need to re-think this to an "acceptable" level of delay.

Also in another comment you said;

BGP it takes quite a time to discover that the route deemed operational is now down; finally the routers realize this and switch active routes, but it takes time, and application-level protocol may suffer

BGP has a hello timer, this is detecting the the presence of it's immediate neighbour. Default is 30 seconds, I suspect this is what you are referring too. If both routers in your topology speak BGP with the ISP at each site or even directly to each other, over those peerings build IP-in-IP tunnels of GRE or L2TP(v3) tunnels between the two routers, over those tunnels run BFD or IP SLA. Now you can detect end-to-end connectivity loss in 1 or 2 seconds, and reroute to the other tunnel using tacking objects.

All in all, you seem to be mixing up different layers of technology. BGP isn't suppose to provide fast re-routing, TCP isn't supposed to be duplicated, and so on. You're looking at the wrong levels of abstraction to tackle this problem. I hope this has helped.

You're missing one important piece... he doesn't have leased lines, he has two IPSec tunnels with ISP diversity — Mike Pennington, Jul 26 '13 at 14:27
He doesn't need them, he can run MPLS over GRE for example, MPLS over IPSEC. He could invest in L2 links possibly? Who knows or cares what his budget is, not me; I'm not saying my ideas are the best, I'm simply trying to provide solutions to the problem that are sane and reliable, irrelevant of cost or availability, and explain further the issues he faces and reasons for making one choice over another. Its a purely technical answer. — jwbensley, Jul 26 '13 at 14:50
Javano, thank you and upvote from me for pointing at MPLS-TE. Sub-second faiover time sounds good. I'll need to discuss it with our networking team... And still it is curious to observe the action of the funny law that states that everything that can be misunderstood will be misunderstood :) 1) I did not mean that the links between routers need to operate at full throughput, hence no need for doubling router/host links; 2) I never meant two TCP streams at host level; TCP link should remain one, and that's routers' task to take care of redundancy and zero-delay failover... — Sergey Ushakov, Jul 29 '13 at 05:20
... and still I wonder why the [TDM over IP solution by Engage Communications](http://www.engageinc.com/downloads/IPTube_Protector_White.pdf) does not count as proof of concept. Yes, their payload is TDM and not TCP. But is there any important difference? What might prevent them (besides business) from making a general purpose redundant zero time failover IP router of their box? — Sergey Ushakov, Jul 29 '13 at 05:31
Why not? And what is failover time for their solution in your opinion? — Sergey Ushakov, Jul 29 '13 at 09:16
It doesn't say in that document, to repeat my self though `Time must pass for possible events to become actualities` - There is no such thing as zero time. The box has to check for losses, delays, drops etc, that takes time, it may be mili or micro seconds, but it takes some period of time. Just like BFD for example, if you set the hello time to 50ms, with a default hold time of 3x hello, you have to wait 150ms for fail-over to occur. Now please stop comparing a TDM backup solution to your scenario. By it's very nature it is possible to office a TDM service like the TCP redundancy you require — jwbensley, Jul 29 '13 at 09:32
...because you know when a TDM packet should exactly arrive. If you don't full understand how E1s/T1s work, I suggest you read about that first. Then you will understand one reason for having TDM links is for reliability such as guaranteed latency. They run at a fixed speed and frame rate per second. IP/TCP is all over the scale. TDM is much more predictable and this runs at a lower layer than TCP, it would be like duplicating Ethernet frames over two links. The fact that these boxes are running TDM over IP adds in some potential for change and skewing of the two TDM streams, that is why... — jwbensley, Jul 29 '13 at 09:35
...those boxes have skew timers and out-of-order frame detectors (reading sequence numbers). — jwbensley, Jul 29 '13 at 09:35
Well javano, than you for the TDM clarification. I now see the difference. And I am grateful for MPLS-TE, really. And still I can's understand why zero failover time is not possible. The 'merger' router might hold a log of the packets passed through and check every incoming packet against that log. Every packet that does not match gets through. Every packet that already has a match is dropped. Quite similar to the TCP logics at the receiver end. Why not? — Sergey Ushakov, Jul 29 '13 at 10:05
Last comment, we shouldn't be having a discussion in the comments section; `The 'merger' router might hold a log of the packets passed through and check every incoming packet against that log` - It takes some amount of time to make that check. Everything takes time. So you have to work out how long that takes, if it that is fast enough / acceptable, for you. — jwbensley, Jul 29 '13 at 12:32

score 1 · Answer 3 · answered Jul 24 '13 at 01:37

1

This is an application layer problem and not a network level problem. This is because one of the core principles of IP is prevent duplicates especially when TCP retransmission is invoked.
In highly critical environments, the approach will be to have 2 NICs on the end hosts and get the application to generate 2 unique packets. With this approach you can use existing technologies and network principles using variable paths and metrics.

answered Jul 24 '13 at 01:37

tdops

21
1

sorry, but can't agree that this is an application layer problem; application has a right to just expect a TCP link of sufficient quality; TCP itself has provisions for recovery after minor network failures, and there are numerous solutions providing network fault tolerance by alternate routing; unfortunately all of them I know of are of _recover-fast-after-failure_ kind rather than of _no-need-to-recover_ one; I perceive this task as just a reduntant network engineering one; after all, if we can have a RAID, why can't we have a RAIN? :) – Sergey Ushakov Jul 24 '13 at 04:18
Two NICs with two tcp sessions means the OP must decide which TCP session is more reliable. – radio-free-europe Jul 26 '13 at 01:18
Just to avoid misunderstanding: I never meant two TCP sessions. TCP session should be one. That's the routers' task to take care of redundancy and zero delay TCP traffic failover. – Sergey Ushakov Jul 29 '13 at 05:23

score 0 · Answer 4 · answered Jul 23 '13 at 08:02

I'm not aware of tricks or protocols that can perform this type of forward replication on the network devices in question - for this type of application I would recommend redundancy and fast failure detection using BGP fast-failover, BFD and other tools. However, I came across this open source project called 'Tunnel Splitter' http://coderrr.wordpress.com/2010/01/10/tunnel-splitter-accelerating-a-single-tcp-connection-over-multiple-isps/ that seems to fit what you're looking for. In short, the TS boxes installed at each site proxy the TCP connections between host1 and host2, and then split the traffic between them over tunnels. As each tunnel has a unique source address, PBR (policy-based routing) can be used at the routers to direct traffic for tunnel1 over link1 and tunnel2 over link2. The TS boxes terminate the tunnels, and have a single tcp connection to host1 and host2. Of course, you would need to really,really test this but it seems to work on the whiteboard!

sounds promising and fitting the bill (though not industrial-grade), but unfortunately GitHub responds with 404 for this project already... do you know what happened to this project afterwards? — Sergey Ushakov, Jul 24 '13 at 02:11
unfortunately I do not. You may have to contact the authors directly. — smoothbSE, Jul 24 '13 at 05:07

score 0 · Accepted Answer · answered Feb 26 '17 at 16:12

0

With Mikrotik routers, you can using bonding in broadcast mode, see bonding. I made some tests accross a 4G link connection, it does reduce packet loss going from 1 to 2 and I benefit from TCP speed improvements. Packet losses are not completely eliminated but going to 3 links does not further improve. I would investigate next in network coded TCP.

answered Feb 26 '17 at 16:12

Netflow

24
1

Product or resource recommendations are explicitly off-topic here, as are consumer-grade devices, e.g. MikroTik. – Ron Maupin Feb 26 '17 at 17:38
@Netflow Thank you for noting bonding in broadcast mode, regardless of Mikrotik :) Not sure if I'll be able to give it a try in some near future, but still it's good to know there seems to be a standards-based approach... – Sergey Ushakov Feb 27 '17 at 01:39

score 0 · Answer 6 · answered Apr 01 '21 at 13:16

0

I know this is an old thread but there now is a solution for the video over ip world in "ST 2022-7:2019 - SMPTE Standard - Seamless Protection Switching of RTP Datagrams," in ST 2022-7:2019 , vol., no., pp.1-11, 13 May 2019, doi: 10.5594/SMPTE.ST2022-7.2019. https://ieeexplore.ieee.org/document/8716822

It basically sends two copies of every packet over two paths then combines then buffers a certain number ro allow for dropped packets. It will automatically select the packets to deliver from each stream seamlessly.

A standards based solution for tcp/ip using this as a base would be very useful.

answered Apr 01 '21 at 13:16

Michael McCarthy

1

True - for UDP-based RTP that can definitely work. The OP was specifically asking about TCP though where it can't *ever* work the way he wants. – Zac67 Apr 01 '21 at 15:51
Michael McCarthy: Thank you for noting a solution for a related problem, though a bit specific indeed. @Zac67 Fortunately solutions for the original problem do exist, with TCP not excluded :) One is mentioned in the accepted answer, and Linux IP Bonding in broadcast mode is another technology to mention, maybe a less proprietary one... – Sergey Ushakov Apr 03 '21 at 06:50

Redundant IP link aggregation for failover operation without route failure detection

6 Answers6