Below is a special case of the Karn algorithm I noted with that may be useful for getting useful round-trip-time (RTT) estimations on out-of-order retransmissions across multiple paths as the path length and RTT change. Since I'm not doing anything more with the idea at present, I've archived discussion and present the idea as is - if only for my future self.
As far as I can see, echoing timestamps would make this sleight-of-hand observation redundant - but not all TCP receivers echo timestamps a la RFC1323, and this expired draft (mirror) commenting on 1323 points out that Karn is still needed for communication with older stacks.
Sections 3.3 and 3.4 of Karn's 1987 Sigcomm paper discuss how RTT measurements are ignored for retransmitted packets, and indicate that a change in path delay causing retransmissions results in a lack of meaningful RTT measurements in the short term, which is where this comes in.
So, if you're communicating across multiple paths (a wireless mesh or something else) with a legacy TCP stack on a receiver in the fixed network, this idea may be of use for your TCP stack.
Or it may not - I haven't looked into it that deeply.
People in the discussion:
Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m107llG-0002oCa; Tue, 2 Feb 99 19:45 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma003254; Tue, 2 Feb 99 19:44:58 GMT Received: from assateague.lerc.nasa.gov (actually host assateague-fi.lerc.nasa.gov) by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Tue, 2 Feb 1999 19:44:25 +0000 Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main) id OAA08036; Tue, 2 Feb 1999 14:35:33 -0500 (EST) X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-pilc@lerc.nasa.gov using -f Received: from owl.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main) id OAA05923; Tue, 2 Feb 1999 14:31:05 -0500 (EST) Received: (from floyd@localhost) by owl.ee.lbl.gov (8.9.2/8.9.2) id LAA06533; Tue, 2 Feb 1999 11:27:29 -0800 (PST) Message-Id: <199902021927.LAA06533@owl.ee.lbl.gov> To: karn@qualcomm.com cc: pilc@lerc.nasa.gov, Tom Henderson <tomh@cs.berkeley.edu> Subject: Re: TCP and out-of-order delivery Date: Tue, 02 Feb 1999 11:27:28 PST From: Sally Floyd <floyd@ee.lbl.gov> Precedence: bulk Status: RO X-Status: Phil - >The fast recovery mechanism in TCP has always bothered me a little >precisely because its reaction to out-of-order delivery. The Internet >architecture has always permitted this, even if it doesn't happen all >that often in practice (yet). Even without radio links, the increased >use of load splitting will make this a problem worth solving at the >TCP level. My understanding is that the TCP-level solution to TCP's problems with out-of-order delivery would go as follows: (1) Add a recommendation to SACK so that, when the receiver receives a packet that is already covered by the cumulative acknowledgement, the receiver not only returns a dup ack (as it does now), but uses the first SACK block to indicate the sequence numbers of the packet that resulted in the dup ack. If the receiver also holds some non-contiguous blocks of data, they could be reported in the second and third SACK blocks. With this added recommendation (which would be optional, but which does not violate anything in the SACK RFC 2018, I believe), the sender would be able to reconstruct the exact sequence of data received by the receiver. (2) Consider a case where the sender receives three dup acks, infers that packet N has been lost, enters Fast Recovery, and retransmits packet N. If packet N in fact was not lost, but was simply reordered, then the sender will be able to detect this later on (assuming that acknowledgements aren't lost on the return path). Thus the sender can determine that it did not need to retransmit packet N, and it did not need to reduce its window in half. If the sender determines after-the-fact that it did not need to retransmit packet N, there is not much that it can do about it, except perhaps to modify its dup ack threshold. The bandwidth wasted by the unnecessary retransmission of that packet has already been wasted. But the sender can do something when it determines that it unnecessarily reduced its congestion window. That is, the sender can put ssthresh back to its old value (before it cut ssthresh in half when it initiated Fast Recovery), and safely slow-start back up to its old congestion window. I think that this would be a simple and clean change to make to SACK TCP, and would take care of some of the performance problems of TCP with load splitting and other forms of strong out-of-order delivery. (Though of couse it would need some exploration first.) (There are also not necessarily original ideas, though I don't know off-hand of any citations. I have talked with a number of people about this since the SACK RFC was first started, most recently with Tom Henderson, who pointed out to me the problem with this proposal in the absence of (1) above.) - Sally Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m107lvX-0002qha; Tue, 2 Feb 99 19:56 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma003529; Tue, 2 Feb 99 19:55:49 GMT Received: from assateague.lerc.nasa.gov (actually host assateague-fi.lerc.nasa.gov) by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Tue, 2 Feb 1999 19:54:57 +0000 Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main) id OAA14860; Tue, 2 Feb 1999 14:50:31 -0500 (EST) X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-pilc@lerc.nasa.gov using -f Received: from fw-es06.hac.com (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with ESMTP (NASA LeRC 8.7.4.1/2.01-main) id OAA13007; Tue, 2 Feb 1999 14:46:44 -0500 (EST) From: Aaron.Falk@hscs066.es.hac.com Received: from ises01.ES.HAC.COM ([147.16.5.2]) by fw-es06.hac.com (8.8.4/8.8.4) with SMTP id LAA25690 for <pilc@lerc.nasa.gov>; Tue, 2 Feb 1999 11:46:46 -0800 (PST) Received: by ises01.ES.HAC.COM; id AA08773; Tue, 2 Feb 1999 11:46:44 -0800 Received: by HSCS066.es.hac.com(Lotus SMTP MTA v4.6.2 (693.3 8-11-1998)) id 8825670C.006CA522 ; Tue, 2 Feb 1999 11:46:41 -0800 X-Lotus-Fromdomain: HSC-RAD To: Sally Floyd <floyd@ee.lbl.gov> Cc: karn@qualcomm.com, pilc@lerc.nasa.gov, Tom Henderson <tomh@cs.berkeley.edu> Message-Id: <8825670C.006CA3CA.00@HSCS066.es.hac.com> Date: Tue, 2 Feb 1999 11:46:37 -0800 Subject: Re: TCP and out-of-order delivery Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: bulk Status: RO X-Status: Sally- Pardon the ignorant question, but how does the sender distinguish between the ACK of the original packet versus the retransmission? --aaron If packet N in fact was not lost, but was simply reordered, then the sender will be able to detect this later on (assuming that acknowledgements aren't lost on the return path). Thus the sender can determine that it did not need to retransmit packet N, and it did not need to reduce its window in half. Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m107pch-0002rGa; Tue, 2 Feb 99 23:52 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma008892; Tue, 2 Feb 99 23:52:46 GMT Received: from assateague.lerc.nasa.gov (actually host assateague-fi.lerc.nasa.gov) by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Tue, 2 Feb 1999 23:52:16 +0000 Received: (listserv@localhost) by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main) id SAA01614; Tue, 2 Feb 1999 18:45:52 -0500 (EST) X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to owner-pilc@lerc.nasa.gov using -f Received: from ainur.ee.surrey.ac.uk (fw01.lerc.nasa.gov [139.88.145.14]) by assateague-fi.lerc.nasa.gov with SMTP (NASA LeRC 8.7.4.1/2.01-main) id SAA29515; Tue, 2 Feb 1999 18:40:27 -0500 (EST) Received: from petra.ee.surrey.ac.uk by ainur.ee.surrey.ac.uk with smtp (Smail3.1.29.1-ident) id m107pPq-0002qhC; Tue, 2 Feb 99 23:39 GMT Date: Tue, 2 Feb 1999 23:39:33 +0000 (GMT) From: Lloyd Wood <L.Wood@surrey.ac.uk> X-Sender: eep1lw@petra.ee.surrey.ac.uk Reply-To: end2end-interest@ISI.EDU To: Sally Floyd <floyd@ee.lbl.gov> cc: pilc@lerc.nasa.gov, end2end-interest@ISI.EDU, Aaron.Falk@hscs066.es.hac.com, karn@qualcomm.com, tomh@cs.berkeley.edu Subject: Re: TCP and out-of-order delivery In-Reply-To: <199902022009.MAA06568@owl.ee.lbl.gov> Message-ID: <Pine.SOL.3.95.990202202022.4494U-100000@petra.ee.surrey.ac.uk> Organization: speaking for none X-URL: http://www.ee.surrey.ac.uk/Personal/L.Wood/ X-no-archive: yes MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Status: RO X-Status: On Tue, 2 Feb 1999, Sally Floyd wrote: [Probably veering off-topic for PILC, so I've cc'd to and set followups to end2end as likely to be more appropriate.] An out-of-order-delivery caused by a delay en route leads to a resend and the receipt of two acknowledgements - one for the original and one for the resend. On Tue, 2 Feb 1999, Aaron Falk wrote to Sally Floyd: > >but how does the sender distinguish between > >the ACK of the original packet versus the retransmission? and Sally Floyd replied: > Well, the sender doesn't know which ACK goes with which of the two > transmissions of the packet (so the sender has to take the normal care > that it would take in terms of using ACKs for estimating RTTs). I think there is an opportunity for some form of special-case handling here to give an idea of what's causing the out-of-order delivery - and perhaps even an RTT estimate for it - which could be useful if there's a different path involved, due to load splitting, say. This is even though you don't know which ACK is which; below I demonstrate that the ACK ordering simply doesn't matter, and I go on to speculate wildly on the potential usefulness of this observed information. (with diagrams.) Sally adds: > But the sender does know that it sent the packet only twice, and that > the receiver received two copies of that data. So the > sender can reasonably infer that the first transmission of that > data was not lost in the network, but only reordered. > > Yes? Or have I forgotten something? The resend must occur before the sender receives the first acknowledgment (otherwise that resend never takes place). The second acknowledgement may or may not arrive before the first acknowledgement. These simple conditions lead us to two possible scenarios I'll proceed to illustrate badly in monospace: Sender Receiver Sender Receiver 1| | 1| | - | - | |- | | - | | - | a | - | | - | 2| - | 2| - | - | * - | |* + | * -| | * - | | - *| | - * | b | *- | | - * | * - | |- * | ACK2| - | ACK1| * | - | - | |* | c ACK1| | ACK2| | - let a be time between original send and retransmit. b be time between retransmit and receipt of first acknowledgement c be time between receipt of first and second acknowledgements. - all obtained from simple observation at the sender. elapsed overall _total_ RTT time for sum of both RTTs: 1 + 2 1 + 2 = (a+b+c) + b = (a+b) + (b+c) = a + 2*b + c = a + 2*b + c ... is independent of ACK order. So total RTT time for both round trips is fixed no matter what the acknowledgement order. In retrospect, this is probably obvious, but I had to draw it then check other cases... (and total area of triangles is fixed. This holds for any n packets[*] sent and n acknowledgments received, but I'm not drawing e.g. all six cases for n=3, where you end up with a + 2*b + 3*c + 2*d + e using the same interval labelling as above, and I'll stick to talking about n=2 below and generalising to the rest of the series based on this weird variant on Pascal's triangle later.) We normally neglect RTTs on ACKs for resent packets, since you can't be sure whether the original or the resent packet is being acknowledged - this is Karn's algorithm. There, with a lost packet/lost ACK, the overall time is unbounded, and we don't want to introduce that uncertainty into our RTT calculations, so we effectively discard the RTT for the ACK'd packet because we can't separate it out easily. But here, since both original and all resent packets are successfully ACK'd, the overall sum of the two RTTs is certain. We can use that, since we know the total time holds no matter what the order of the ACKs; we have a valid summed sample time for the group of transmissions and RTTs that may be useful for other purposes, and we can build up an estimate of total-time-for-n-total-resends where n is 2 (or greater) from subsequent samples. (If there are lots of subsequent samples, there may be other problems, but that's beyond the scope of this email.) If the results of this observation are pretty much constant and you see it occurring regularly, I suppose you may be seeing the effects of occasional use of a second path with a different overall delay - one possible explanation. We could subtract the current 'normal' RTT estimate from the typical computed total-RTT-for-an-extra-send (n=2) seen to get an estimate of the RTT for the 'other case', be it a different route or something else - assuming that the second occurrence is sporadic and is not affecting 'normal' RTT (much). This could give a sample 'other delay' of the other path, if there is another path. Again, this is just one possible cause; I haven't thought the range of causes through. Following the arithmetic series, in extreme (rarer?) cases where you have three sends (a send and two retransmits) followed by three unordered acknowledgements, you could then take the current RTT and RTT-for-an-extra-send (n=2) estimates, add them together, and subtract them from the (estimated seen total derived from sum RTT for n=3) to get a rough idea of the delay there, and so on. This assumes that normal RTT is tracking the most popular path/delay, etc, which is probably assuming way too much. [I'll note that if this is a long series and the differences in RTT are clearly delineated, either you've got a very odd multiple path situation or there must be something wrong somewhere else, and it looks like a case of diminishing and error-prone RTT returns for increasing number of sends and ACK returns to me - at least as far as network conditions I'm familiar with should go. Arithmetic series only go so far in implementation.] You will base RTO on _one_ of these computed RTTs to minimise occurrence of multiple ACKs. If you're seeing a _lot_ of out-of-order multiple-ACKs, it could be time to switch right to the another RTO computed from another RTT group and see what happens; I can imagine some sort of Markov-like state transitions, but there are probably practical pitfalls I've overlooked in this abstract arithmetic. Is there anything at all useful here? thanks, L. I've no idea - I'm supposed to be working on multicast. And I need to get out more. [*] probably should be 'segments' since it's TCP, but since everyone else is saying 'packets' in this discussion I'll follow suit. <L.Wood@surrey.ac.uk>PGP<http://www.ee.surrey.ac.uk/Personal/L.Wood/> Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m107rqb-0002fha; Wed, 3 Feb 99 02:15 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma012101; Wed, 3 Feb 99 02:15:06 GMT Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Wed, 3 Feb 1999 02:14:21 +0000 Received: (from majordom@localhost) by zephyr.isi.edu (8.8.7/8.8.6) id QAA06976 for end2end-interest-outgoing; Tue, 2 Feb 1999 16:27:38 -0800 (PST) Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by zephyr.isi.edu (8.8.7/8.8.6) with ESMTP id QAA06971 for <end2end-interest@zephyr.isi.edu>; Tue, 2 Feb 1999 16:27:36 -0800 (PST) Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50]) by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id QAA07989 for <end2end-interest@ISI.EDU>; Tue, 2 Feb 1999 16:27:35 -0800 (PST) Received: (from floyd@localhost) by owl.ee.lbl.gov (8.9.2/8.9.2) id QAA06775; Tue, 2 Feb 1999 16:27:35 -0800 (PST) Message-Id: <199902030027.QAA06775@owl.ee.lbl.gov> To: end2end-interest@ISI.EDU Subject: Re: TCP and out-of-order delivery Date: Tue, 02 Feb 1999 16:27:35 PST From: Sally Floyd <floyd@ee.lbl.gov> Precedence: bulk Status: RO X-Status: This is background for a thread that started out on the pilc mailing list, and that now is migrating to end2end-interest instead. Followup should be on end2end-interest, I would say. - Sally ------- Forwarded Message Date: Tue, 02 Feb 1999 11:27:28 -0800 From: Sally Floyd <floyd> To: karn@qualcomm.com cc: pilc@lerc.nasa.gov, Tom Henderson <tomh@cs.berkeley.edu> Subject: Re: TCP and out-of-order delivery Phil - >The fast recovery mechanism in TCP has always bothered me a little >precisely because its reaction to out-of-order delivery. The Internet >architecture has always permitted this, even if it doesn't happen all >that often in practice (yet). Even without radio links, the increased >use of load splitting will make this a problem worth solving at the >TCP level. My understanding is that the TCP-level solution to TCP's problems with out-of-order delivery would go as follows: (1) Add a recommendation to SACK so that, when the receiver receives a packet that is already covered by the cumulative acknowledgement, the receiver not only returns a dup ack (as it does now), but uses the first SACK block to indicate the sequence numbers of the packet that resulted in the dup ack. If the receiver also holds some non-contiguous blocks of data, they could be reported in the second and third SACK blocks. With this added recommendation (which would be optional, but which does not violate anything in the SACK RFC 2018, I believe), the sender would be able to reconstruct the exact sequence of data received by the receiver. (2) Consider a case where the sender receives three dup acks, infers that packet N has been lost, enters Fast Recovery, and retransmits packet N. If packet N in fact was not lost, but was simply reordered, then the sender will be able to detect this later on (assuming that acknowledgements aren't lost on the return path). Thus the sender can determine that it did not need to retransmit packet N, and it did not need to reduce its window in half. If the sender determines after-the-fact that it did not need to retransmit packet N, there is not much that it can do about it, except perhaps to modify its dup ack threshold. The bandwidth wasted by the unnecessary retransmission of that packet has already been wasted. But the sender can do something when it determines that it unnecessarily reduced its congestion window. That is, the sender can put ssthresh back to its old value (before it cut ssthresh in half when it initiated Fast Recovery), and safely slow-start back up to its old congestion window. I think that this would be a simple and clean change to make to SACK TCP, and would take care of some of the performance problems of TCP with load splitting and other forms of strong out-of-order delivery. (Though of couse it would need some exploration first.) (There are also not necessarily original ideas, though I don't know off-hand of any citations. I have talked with a number of people about this since the SACK RFC was first started, most recently with Tom Henderson, who pointed out to me the problem with this proposal in the absence of (1) above.) - Sally ------- End of Forwarded Message Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m107s4l-0002pNa; Wed, 3 Feb 99 02:29 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma012390; Wed, 3 Feb 99 02:29:33 GMT Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Wed, 3 Feb 1999 02:29:00 +0000 Received: (from majordom@localhost) by zephyr.isi.edu (8.8.7/8.8.6) id QAA07430 for end2end-interest-outgoing; Tue, 2 Feb 1999 16:39:41 -0800 (PST) Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by zephyr.isi.edu (8.8.7/8.8.6) with ESMTP id QAA07425 for <end2end-interest@zephyr.isi.edu>; Tue, 2 Feb 1999 16:39:40 -0800 (PST) Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50]) by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id QAA09185 for <end2end-interest@ISI.EDU>; Tue, 2 Feb 1999 16:39:39 -0800 (PST) Received: (from floyd@localhost) by owl.ee.lbl.gov (8.9.2/8.9.2) id QAA06792; Tue, 2 Feb 1999 16:39:36 -0800 (PST) Message-Id: <199902030039.QAA06792@owl.ee.lbl.gov> To: end2end-interest@ISI.EDU Subject: Re: TCP and out-of-order delivery Date: Tue, 02 Feb 1999 16:39:36 PST From: Sally Floyd <floyd@ee.lbl.gov> Precedence: bulk Status: RO X-Status: ... >I think there is an opportunity for some form of special-case handling >here to give an idea of what's causing the out-of-order delivery - and >perhaps even an RTT estimate for it - which could be useful if there's >a different path involved, due to load splitting, say. Yep, I agree. ... >But here, since both original and all resent packets are successfully >ACK'd, the overall sum of the two RTTs is certain. We can use that, >since we know the total time holds no matter what the order of >the ACKs; we have a valid summed sample time for the group of >transmissions and RTTs that may be useful for other purposes, and we >can build up an estimate of total-time-for-n-total-resends where n is >2 (or greater) from subsequent samples. (If there are lots of >subsequent samples, there may be other problems, but that's beyond the >scope of this email.) ... >Is there anything at all useful here? I think so. And at the very least, you can use the n-transmissions-and-n-acks to calculate an average roundtrip time for those n transmissions. But unnecessarily-retransmitted packets will hopefully not be a very frequent occurrence. (If they are, then the sender didn't do very well figuring out how to increase the dup ack threshold...) - Sally Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m107txd-0002pNa; Wed, 3 Feb 99 04:30 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xmaa14388; Wed, 3 Feb 99 04:30:36 GMT Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Wed, 3 Feb 1999 04:30:10 +0000 Received: (from majordom@localhost) by zephyr.isi.edu (8.8.7/8.8.6) id SAA11010 for end2end-interest-outgoing; Tue, 2 Feb 1999 18:24:08 -0800 (PST) Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by zephyr.isi.edu (8.8.7/8.8.6) with ESMTP id SAA11005 for <end2end-interest@zephyr.isi.edu>; Tue, 2 Feb 1999 18:24:07 -0800 (PST) Received: from cs.tamu.edu (0@clavin.cs.tamu.edu [128.194.130.106]) by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id SAA17794 for <end2end-interest@isi.edu>; Tue, 2 Feb 1999 18:24:06 -0800 (PST) Received: from sun.cs.tamu.edu (sun [128.194.135.14]) by cs.tamu.edu (8.9.1/8.9.1) with ESMTP id UAA02030; Tue, 2 Feb 1999 20:22:51 -0600 (CST) From: Nitin H Vaidya <vaidya@cs.tamu.edu> Received: (from vaidya@localhost) by sun.cs.tamu.edu (8.9.1/8.9.1) id UAA14094; Tue, 2 Feb 1999 20:16:35 -0600 (CST) Date: Tue, 2 Feb 1999 20:16:35 -0600 (CST) Message-Id: <199902030216.UAA14094@sun.cs.tamu.edu> To: end2end-interest@ISI.EDU, floyd@ee.lbl.gov Subject: Re: TCP and out-of-order delivery Cc: vaidya@cs.tamu.edu Precedence: bulk Status: RO X-Status: >> From eep1lw@surrey.ac.uk Tue Feb 2 17:49:25 1999 >> >> On Tue, 2 Feb 1999, Sally Floyd wrote: >> [Probably veering off-topic for PILC, so I've cc'd to and set >> followups to end2end as likely to be more appropriate.] >> >> An out-of-order-delivery caused by a delay en route leads to a resend >> and the receipt of two acknowledgements - one for the original and one >> for the resend. For this sort of analysis to be valid, you would need to assume that the first n acks you get correspond to first n transmits of a packet. This may not be true, since n packets sent by TCP sender may generate more than n acks, when a packet gets duplicated on some hop on TCP route. May be I am splitting hair ... - nitin Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m107txc-0002fha; Wed, 3 Feb 99 04:30 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma014388; Wed, 3 Feb 99 04:30:36 GMT Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Wed, 3 Feb 1999 04:30:01 +0000 Received: (from majordom@localhost) by zephyr.isi.edu (8.8.7/8.8.6) id SAA11726 for end2end-interest-outgoing; Tue, 2 Feb 1999 18:43:30 -0800 (PST) Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by zephyr.isi.edu (8.8.7/8.8.6) with ESMTP id SAA11721 for <end2end-interest@zephyr.isi.edu>; Tue, 2 Feb 1999 18:43:28 -0800 (PST) Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50]) by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id SAA19078 for <end2end-interest@isi.edu>; Tue, 2 Feb 1999 18:43:28 -0800 (PST) Received: (from floyd@localhost) by owl.ee.lbl.gov (8.9.2/8.9.2) id SAA06913; Tue, 2 Feb 1999 18:43:25 -0800 (PST) Message-Id: <199902030243.SAA06913@owl.ee.lbl.gov> To: Nitin H Vaidya <vaidya@cs.tamu.edu> cc: end2end-interest@ISI.EDU Subject: Re: TCP and out-of-order delivery Date: Tue, 02 Feb 1999 18:43:25 PST From: Sally Floyd <floyd@ee.lbl.gov> Precedence: bulk Status: RO X-Status: >For this sort of analysis to be valid, you would need to assume that >the first n acks you get correspond to first n transmits of a >packet. This may not be true, since n packets sent by TCP >sender may generate more than n acks, when a packet gets >duplicated on some hop on TCP route. Yep. You don't know *for sure* that the n acks aren't due to duplicated packets. And (unicast) packets do sometimes get duplicated in the network, as Vern shows in his measurement studies. If it was the original packet that got delayed and also duplicated, then that is fine - it is still true that your assumption that the original packet was dropped was an incorrect assumption. (Even if, for some reason, you haven't actually received a valid ack for the retransmitted packet.) If the original packet was not duplicated, but instead the original packet was itself lost, and it was the retransmission that was duplicated, then either you have been seeing lots of dup acks all along, and you know to be cautious; or else you have been fooled by a somewhat unlikely combination of events. Because, if packets aren't being duplicated rather frequently, I can't see any reason why the network, when it choose to duplicate a packet, would duplicate one of the few packets that was a retransmission of an earlier packet. Barring adversaries in the network, which I assume that we can ignore for the time being... Convincing? Or not? - Sally Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m1083XV-0002u0a; Wed, 3 Feb 99 14:44 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma003501; Wed, 3 Feb 99 14:44:24 GMT Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Wed, 3 Feb 1999 14:43:56 +0000 Received: (from majordom@localhost) by zephyr.isi.edu (8.8.7/8.8.6) id EAA04665 for end2end-interest-outgoing; Wed, 3 Feb 1999 04:25:15 -0800 (PST) Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by zephyr.isi.edu (8.8.7/8.8.6) with ESMTP id EAA04660 for <end2end-interest@zephyr.isi.edu>; Wed, 3 Feb 1999 04:25:13 -0800 (PST) Received: from ainur.ee.surrey.ac.uk (root@ainur.ee.surrey.ac.uk [131.227.50.25]) by tnt.isi.edu (8.8.7/8.8.6) with SMTP id EAA16367 for <end2end-interest@ISI.EDU>; Wed, 3 Feb 1999 04:25:11 -0800 (PST) Received: from petra.ee.surrey.ac.uk by ainur.ee.surrey.ac.uk with smtp (Smail3.1.29.1-ident) id m1081M7-0002ghC; Wed, 3 Feb 99 12:24 GMT Date: Wed, 3 Feb 1999 12:24:31 +0000 (GMT) From: Lloyd Wood <L.Wood@surrey.ac.uk> X-Sender: eep1lw@petra.ee.surrey.ac.uk Reply-To: Lloyd Wood <L.Wood@surrey.ac.uk> To: Sally Floyd <floyd@ee.lbl.gov> cc: Nitin H Vaidya <vaidya@cs.tamu.edu>, end2end-interest@ISI.EDU Subject: Re: TCP and out-of-order delivery In-Reply-To: <199902030243.SAA06913@owl.ee.lbl.gov> Message-ID: <Pine.SOL.3.95.990203114933.6640D-100000@petra.ee.surrey.ac.uk> Organization: speaking for none X-URL: http://www.ee.surrey.ac.uk/Personal/L.Wood/ X-no-archive: yes MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk Status: RO X-Status: On Tue, 2 Feb 1999, Sally Floyd wrote: > Nitin Vaidya wrote: > >For this sort of analysis to be valid, you would need to assume that > >the first n acks you get correspond to first n transmits of a > >packet. This may not be true, since n packets sent by TCP > >sender may generate more than n acks, when a packet gets > >duplicated on some hop on TCP route. Well, this is after-the-fact analysis, so you can check for more than n ACKs received for the n packet transmissions before treating the sample as valid. You will stop counting transmissions of the packet when the first acknowledgement comes in and you no longer need to resend the packet. That gives you a value of n for the n packet transmissions. You can have a safe timeout on counting acknowledgements. Once you've reached n acknowledgements, you won't stop counting (although you can do the math and get other RTT estimates at that time); you will wait for the duration of that timeout for any duplicate ACKs to appear before treating this sample and computed estimates as valid. If any more ACKs do appear, then you now have n sends and n+1 ACKs, so the arithmetic to get other possible RTT estimates doesn't apply, as you're back to the uncertain case of the Karn algorithm (and you won't care about checking for the arrival of any more dup ACKs for the packet, since that's still a Karn case and to be ignored.) To get the same value of n for _both_ transmissions and ACKs where ACKs are duplicated and the analysis I came up with previously is invalidated, you need the total number of duplicated ACKs to match the total number of lost packets/lost ACKs: for every duplicated ACK received over and above the first ACK for that particular packet transmission, there must be an equivalent lost packet or lost ACK so that the total number of ACKs received by the sender remains at n - or all the extra ACKs are arriving after that safe timeout. I agree with Sally that this scenario is extremely unlikely without other duplications being visible at other times. So, now we have a useful sum-of-n-RTTs value for our unordered multiple ACKs, what can we reliably conclude from it about multipath RTTs/other causes? thanks, L. > Yep. You don't know *for sure* that the n acks aren't due to > duplicated packets. And (unicast) packets do sometimes get duplicated > in the network, as Vern shows in his measurement studies. > > If it was the original packet that got delayed and also duplicated, > then that is fine - it is still true that your assumption that the > original packet was dropped was an incorrect assumption. (Even if, for > some reason, you haven't actually received a valid ack for the > retransmitted packet.) > > If the original packet was not duplicated, but instead the original > packet was itself lost, and it was the retransmission that was > duplicated, then either you have been seeing lots of dup acks all > along, and you know to be cautious; or else you have been fooled by a > somewhat unlikely combination of events. Because, if packets aren't > being duplicated rather frequently, I can't see any reason why the > network, when it choose to duplicate a packet, would duplicate one of > the few packets that was a retransmission of an earlier packet. > Barring adversaries in the network, which I assume that we can ignore > for the time being... > > Convincing? Or not? > > - Sally <L.Wood@surrey.ac.uk>PGP<http://www.ee.surrey.ac.uk/Personal/L.Wood/> Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m1086X8-0002swa; Wed, 3 Feb 99 17:56 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma011040; Wed, 3 Feb 99 17:56:08 GMT Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Wed, 3 Feb 1999 17:55:44 +0000 Received: (from majordom@localhost) by zephyr.isi.edu (8.8.7/8.8.6) id HAA10702 for end2end-interest-outgoing; Wed, 3 Feb 1999 07:52:41 -0800 (PST) Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by zephyr.isi.edu (8.8.7/8.8.6) with ESMTP id HAA10697 for <end2end-interest@zephyr.isi.edu>; Wed, 3 Feb 1999 07:52:41 -0800 (PST) Received: from cs.tamu.edu (0@clavin.cs.tamu.edu [128.194.130.106]) by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id HAA23110 for <end2end-interest@isi.edu>; Wed, 3 Feb 1999 07:52:40 -0800 (PST) Received: from sun.cs.tamu.edu (sun [128.194.135.14]) by cs.tamu.edu (8.9.1/8.9.1) with ESMTP id JAA24473; Wed, 3 Feb 1999 09:51:58 -0600 (CST) From: Nitin H Vaidya <vaidya@cs.tamu.edu> Received: (from vaidya@localhost) by sun.cs.tamu.edu (8.9.1/8.9.1) id JAA14807; Wed, 3 Feb 1999 09:45:36 -0600 (CST) Date: Wed, 3 Feb 1999 09:45:36 -0600 (CST) Message-Id: <199902031545.JAA14807@sun.cs.tamu.edu> To: floyd@ee.lbl.gov Subject: Re: TCP and out-of-order delivery Cc: end2end-interest@ISI.EDU Precedence: bulk Status: RO X-Status: >> From floyd@ee.lbl.gov Tue Feb 2 21:40:34 1999 >> >> If the original packet was not duplicated, but instead the original >> packet was itself lost, and it was the retransmission that was >> duplicated, then either you have been seeing lots of dup acks all >> along, and you know to be cautious; or else you have been fooled by a >> somewhat unlikely combination of events. Because, if packets aren't >> being duplicated rather frequently, I can't see any reason why the >> network, when it choose to duplicate a packet, would duplicate one of >> the few packets that was a retransmission of an earlier packet. >> Barring adversaries in the network, which I assume that we can ignore >> for the time being... >> >> Convincing? Or not? It is convincing ... as I said earlier, I might have been nitpicking. I do find the suggestion by Lloyd Wood quite interesting, though somewhat complicated. I think that it is useful to know under what assumptions the idea would work as intended. Other situations exist where reordering can occur due to reasons other than multipath (e.g., out-of-order delivery by a link layer). This, and packet duplication, may be rare events, hopefully rare enough compared to multipath that they can be ignored. Regards. - nitin Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8) id m1086Y8-0002tTa; Wed, 3 Feb 99 17:57 GMT Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0) id xma011078; Wed, 3 Feb 99 17:56:55 GMT Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP; Wed, 3 Feb 1999 17:56:30 +0000 Received: (from majordom@localhost) by zephyr.isi.edu (8.8.7/8.8.6) id IAA11080 for end2end-interest-outgoing; Wed, 3 Feb 1999 08:01:18 -0800 (PST) Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128]) by zephyr.isi.edu (8.8.7/8.8.6) with ESMTP id IAA11075 for <end2end-interest@zephyr.isi.edu>; Wed, 3 Feb 1999 08:01:16 -0800 (PST) Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50]) by tnt.isi.edu (8.8.7/8.8.6) with ESMTP id IAA23664 for <end2end-interest@isi.edu>; Wed, 3 Feb 1999 08:01:16 -0800 (PST) Received: (from floyd@localhost) by owl.ee.lbl.gov (8.9.2/8.9.2) id IAA07343; Wed, 3 Feb 1999 08:01:12 -0800 (PST) Message-Id: <199902031601.IAA07343@owl.ee.lbl.gov> To: Nitin H Vaidya <vaidya@cs.tamu.edu> cc: end2end-interest@ISI.EDU Subject: Re: TCP and out-of-order delivery Date: Wed, 03 Feb 1999 08:01:12 PST From: Sally Floyd <floyd@ee.lbl.gov> Precedence: bulk Status: RO X-Status: >It is convincing ... as I said earlier, I might have been nitpicking. >I do find the suggestion by Lloyd Wood quite interesting, >though somewhat complicated. Yes. In my email, I was really only addressing the issue of how, even though packets can get duplicated in the network, you can still reasonably infer that 2-acks-for-2-transmissions means that you retransmitted the packet unnecessarily. (I wasn't addressing the complications of using the acks for RTT measurements...) - Sally Date: Sat, 6 Feb 1999 01:04:21 +0000 (GMT) From: Lloyd Wood <eep1lw@surrey.ac.uk> X-Sender: eep1lw@petra.ee.surrey.ac.uk Reply-To: Lloyd Wood <L.Wood@surrey.ac.uk> To: Sally Floyd <floyd@ee.lbl.gov> cc: Nitin H Vaidya <vaidya@cs.tamu.edu>, end2end-interest@ISI.EDU Subject: Re: TCP and out-of-order delivery In-Reply-To: <199902031601.IAA07343@owl.ee.lbl.gov> Message-ID: <Pine.SOL.3.95.990206000001.13118T-100000@petra.ee.surrey.ac.uk> Full-Name: Lloyd Wood Organization: speaking for none X-URL: http://www.ee.surrey.ac.uk/Personal/L.Wood/ X-no-archive: yes MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: O X-Status: On Wed, 3 Feb 1999, Sally Floyd wrote: > Nitin Vaidya wrote: > >It is convincing ... as I said earlier, I might have been nitpicking. > >I do find the suggestion by Lloyd Wood quite interesting, > >though somewhat complicated. > > Yes. In my email, I was really only addressing the issue of how, even > though packets can get duplicated in the network, you can still > reasonably infer that 2-acks-for-2-transmissions means that you > retransmitted the packet unnecessarily. (I wasn't addressing the > complications of using the acks for RTT measurements...) Speaking of such complications... Using RTTM via timestamps as described in RFC1323 should make both Karn's algorithm and any marginal special-case improvements to it from implementing my observation completely redundant, since you should always know which ACK belongs to which packet from looking at the echoed timestamps. However, if your receiver isn't RFC1323-compliant and doesn't do timestamps, you must fall back on Karn's algorithm for RTT estimation. In that case, this observation should be worth considering as offering a marginal measurement improvement, since it can give you more RTT samples and some idea of how RTT is changing. (I imagine there are lots of those legacy hosts around, so if you're communicating with one via a multipath mesh mobile network where the route and path delay keep switching...) I got quite excited when I read section 3.3 of Karn and Partridge's 1987 SIGCOMM paper 'Improving Round-Trip Time Estimates in Reliable Transport Protocols' (there's a copy available from the bottom of http://people.qualcomm.com/karn/papers/index.html ). That section appears to describe a situation that this observation can, I believe, improve by giving better immediate estimates of RTT and slightly finer input to RTO to help it converge faster (not that you want RTO to be too finely controlled...) Bob Braden's expired 'TCP Extensions for High Performance: An Update' draft was very helpful in clarifying this and the more obscure parts of RFC1323 for me. (Stevens has a copy usefully archived at: http://www.kohala.com/~rstevens/tcplw-extensions.txt ) I'm vaguely hoping someone might find this useful and run with it with simulations etc., even though my suggestion would have been more immediately useful, oh, twelve years back. Cheers, L. <L.Wood@surrey.ac.uk>PGP<http://www.ee.surrey.ac.uk/Personal/L.Wood/>