Tweaking the Karn algorithm

or the only decent TCP idea I'll ever have. twelve years too late.

Below is a special case of the Karn algorithm I noted with that may be useful for getting useful round-trip-time (RTT) estimations on out-of-order retransmissions across multiple paths as the path length and RTT change. Since I'm not doing anything more with the idea at present, I've archived discussion and present the idea as is - if only for my future self.

As far as I can see, echoing timestamps would make this sleight-of-hand observation redundant - but not all TCP receivers echo timestamps a la RFC1323, and this expired draft (mirror) commenting on 1323 points out that Karn is still needed for communication with older stacks.

Sections 3.3 and 3.4 of Karn's 1987 Sigcomm paper discuss how RTT measurements are ignored for retransmitted packets, and indicate that a change in path delay causing retransmissions results in a lack of meaningful RTT measurements in the short term, which is where this comes in.

So, if you're communicating across multiple paths (a wireless mesh or something else) with a legacy TCP stack on a receiver in the fixed network, this idea may be of use for your TCP stack.

Or it may not - I haven't looked into it that deeply.

People in the discussion:

Discussion began on PILC and moved to end2end-interest, which doesn't have a decent threaded webarchive as far as I know.
Lloyd Wood (L.Wood@society.surrey.ac.uk)
6 February 1999

Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m107llG-0002oCa; Tue, 2 Feb 99 19:45 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma003254; Tue, 2 Feb 99 19:44:58 GMT
Received: from assateague.lerc.nasa.gov (actually host assateague-fi.lerc.nasa.gov) 
          by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP;
          Tue, 2 Feb 1999 19:44:25 +0000
Received: (listserv@localhost) 
          by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)        
          id OAA08036; Tue, 2 Feb 1999 14:35:33 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to 
                          owner-pilc@lerc.nasa.gov using -f
Received: from owl.ee.lbl.gov (fw01.lerc.nasa.gov [139.88.145.14]) 
          by assateague-fi.lerc.nasa.gov 
          with ESMTP (NASA LeRC 8.7.4.1/2.01-main)        id OAA05923;
          Tue, 2 Feb 1999 14:31:05 -0500 (EST)
Received: (from floyd@localhost)	by owl.ee.lbl.gov (8.9.2/8.9.2) id LAA06533;
          Tue, 2 Feb 1999 11:27:29 -0800 (PST)
Message-Id: <199902021927.LAA06533@owl.ee.lbl.gov>
To: karn@qualcomm.com
cc: pilc@lerc.nasa.gov, Tom Henderson <tomh@cs.berkeley.edu>
Subject: Re: TCP and out-of-order delivery
Date: Tue, 02 Feb 1999 11:27:28 PST
From: Sally Floyd <floyd@ee.lbl.gov>
Precedence: bulk
Status: RO
X-Status: 

Phil -

>The fast recovery mechanism in TCP has always bothered me a little
>precisely because its reaction to out-of-order delivery.  The Internet
>architecture has always permitted this, even if it doesn't happen all
>that often in practice (yet). Even without radio links, the increased
>use of load splitting will make this a problem worth solving at the
>TCP level.

My understanding is that the TCP-level solution to TCP's problems with
out-of-order delivery would go as follows:

(1) Add a recommendation to SACK so that, when the receiver receives a
packet that is already covered by the cumulative acknowledgement, the
receiver not only returns a dup ack (as it does now), but uses the
first SACK block to indicate the sequence numbers of the packet that
resulted in the dup ack.  If the receiver also holds some
non-contiguous blocks of data, they could be reported in the second and
third SACK blocks.

With this added recommendation (which would be optional, but which does
not violate anything in the SACK RFC 2018, I believe), the sender would
be able to reconstruct the exact sequence of data received by the
receiver.

(2) Consider a case where the sender receives three dup acks, infers
that packet N has been lost, enters Fast Recovery, and retransmits
packet N.  If packet N in fact was not lost, but was simply reordered,
then the sender will be able to detect this later on (assuming that
acknowledgements aren't lost on the return path).  Thus the sender can
determine that it did not need to retransmit packet N, and it did not need
to reduce its window in half.

If the sender determines after-the-fact that it did not need to
retransmit packet N, there is not much that it can do about it, except
perhaps to modify its dup ack threshold.  The bandwidth wasted by the
unnecessary retransmission of that packet has already been wasted.  But
the sender can do something when it determines that it unnecessarily
reduced its congestion window.  That is, the sender can put ssthresh
back to its old value (before it cut ssthresh in half when it initiated
Fast Recovery), and safely slow-start back up to its old congestion
window.


I think that this would be a simple and clean change to make to SACK
TCP, and would take care of some of the performance problems of TCP
with load splitting and other forms of strong out-of-order delivery.
(Though of couse it would need some exploration first.)

(There are also not necessarily original ideas, though I don't know
off-hand of any citations.  I have talked with a number of people about
this since the SACK RFC was first started, most recently with Tom
Henderson, who pointed out to me the problem with this proposal in the
absence of (1) above.)

- Sally

Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m107lvX-0002qha; Tue, 2 Feb 99 19:56 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma003529; Tue, 2 Feb 99 19:55:49 GMT
Received: from assateague.lerc.nasa.gov (actually host assateague-fi.lerc.nasa.gov) 
          by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP;
          Tue, 2 Feb 1999 19:54:57 +0000
Received: (listserv@localhost) 
          by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)        
          id OAA14860; Tue, 2 Feb 1999 14:50:31 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to 
                          owner-pilc@lerc.nasa.gov using -f
Received: from fw-es06.hac.com (fw01.lerc.nasa.gov [139.88.145.14]) 
          by assateague-fi.lerc.nasa.gov 
          with ESMTP (NASA LeRC 8.7.4.1/2.01-main)        id OAA13007;
          Tue, 2 Feb 1999 14:46:44 -0500 (EST)
From: Aaron.Falk@hscs066.es.hac.com
Received: from ises01.ES.HAC.COM ([147.16.5.2])          
          by fw-es06.hac.com (8.8.4/8.8.4) with SMTP	  id LAA25690 
          for <pilc@lerc.nasa.gov>; Tue, 2 Feb 1999 11:46:46 -0800 (PST)
Received: by ises01.ES.HAC.COM; id AA08773; Tue, 2 Feb 1999 11:46:44 -0800
Received: by HSCS066.es.hac.com(Lotus SMTP MTA v4.6.2  (693.3 8-11-1998))  
          id 8825670C.006CA522 ; Tue, 2 Feb 1999 11:46:41 -0800
X-Lotus-Fromdomain: HSC-RAD
To: Sally Floyd <floyd@ee.lbl.gov>
Cc: karn@qualcomm.com, pilc@lerc.nasa.gov, Tom Henderson <tomh@cs.berkeley.edu>
Message-Id: <8825670C.006CA3CA.00@HSCS066.es.hac.com>
Date: Tue, 2 Feb 1999 11:46:37 -0800
Subject: Re: TCP and out-of-order delivery
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Precedence: bulk 
Status: RO
X-Status: 

Sally-

Pardon the ignorant question, but how does the sender distinguish between
the ACK of the original packet versus the retransmission?

--aaron



If packet N in fact was not lost, but was simply reordered,
then the sender will be able to detect this later on (assuming that
acknowledgements aren't lost on the return path).  Thus the sender can
determine that it did not need to retransmit packet N, and it did not need
to reduce its window in half.



Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m107pch-0002rGa; Tue, 2 Feb 99 23:52 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma008892; Tue, 2 Feb 99 23:52:46 GMT
Received: from assateague.lerc.nasa.gov (actually host assateague-fi.lerc.nasa.gov) 
          by maile.surrey.ac.uk with SMTP-external (PP) with ESMTP;
          Tue, 2 Feb 1999 23:52:16 +0000
Received: (listserv@localhost) 
          by assateague.lerc.nasa.gov (NASA LeRC 8.7.4.1/2.01-main)        
          id SAA01614; Tue, 2 Feb 1999 18:45:52 -0500 (EST)
X-Authentication-Warning: assateague-fi.lerc.nasa.gov: listserv set sender to 
                          owner-pilc@lerc.nasa.gov using -f
Received: from ainur.ee.surrey.ac.uk (fw01.lerc.nasa.gov [139.88.145.14]) 
          by assateague-fi.lerc.nasa.gov 
          with SMTP (NASA LeRC 8.7.4.1/2.01-main)        id SAA29515;
          Tue, 2 Feb 1999 18:40:27 -0500 (EST)
Received: from petra.ee.surrey.ac.uk by ainur.ee.surrey.ac.uk 
          with smtp	(Smail3.1.29.1-ident) id m107pPq-0002qhC;
          Tue, 2 Feb 99 23:39 GMT
Date: Tue, 2 Feb 1999 23:39:33 +0000 (GMT)
From: Lloyd Wood <L.Wood@surrey.ac.uk>
X-Sender: eep1lw@petra.ee.surrey.ac.uk
Reply-To: end2end-interest@ISI.EDU
To: Sally Floyd <floyd@ee.lbl.gov>
cc: pilc@lerc.nasa.gov, end2end-interest@ISI.EDU, Aaron.Falk@hscs066.es.hac.com, 
    karn@qualcomm.com, tomh@cs.berkeley.edu
Subject: Re: TCP and out-of-order delivery
In-Reply-To: <199902022009.MAA06568@owl.ee.lbl.gov>
Message-ID: <Pine.SOL.3.95.990202202022.4494U-100000@petra.ee.surrey.ac.uk>
Organization: speaking for none
X-URL: http://www.ee.surrey.ac.uk/Personal/L.Wood/
X-no-archive: yes
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Status: RO
X-Status: 

On Tue, 2 Feb 1999, Sally Floyd wrote:
[Probably veering off-topic for PILC, so I've cc'd to and set
 followups to end2end as likely to be more appropriate.]

An out-of-order-delivery caused by a delay en route leads to a resend
and the receipt of two acknowledgements - one for the original and one
for the resend. 

On Tue, 2 Feb 1999, Aaron Falk wrote to Sally Floyd:
> >but how does the sender distinguish between
> >the ACK of the original packet versus the retransmission?

and Sally Floyd replied:
> Well, the sender doesn't know which ACK goes with which of the two
> transmissions of the packet (so the sender has to take the normal care
> that it would take in terms of using ACKs for estimating RTTs).  

I think there is an opportunity for some form of special-case handling
here to give an idea of what's causing the out-of-order delivery - and
perhaps even an RTT estimate for it - which could be useful if there's
a different path involved, due to load splitting, say.

This is even though you don't know which ACK is which; below I
demonstrate that the ACK ordering simply doesn't matter, and I go on
to speculate wildly on the potential usefulness of this observed
information.  (with diagrams.) 


Sally adds:
> But the sender does know that it sent the packet only twice, and that
> the receiver received two copies of that data.  So the
> sender can reasonably infer that the first transmission of that
> data was not lost in the network, but only reordered.
>
> Yes?  Or have I forgotten something?

The resend must occur before the sender receives the first
acknowledgment (otherwise that resend never takes place). The second
acknowledgement may or may not arrive before the first acknowledgement.

These simple conditions lead us to two possible scenarios I'll proceed
to illustrate badly in monospace:

    Sender       Receiver            Sender       Receiver
   1|            |                  1|            |            -
    | -          |                   |-           |
    |   -        |                   |   -        |            a
    |     -      |                   |      -     |
   2|       -    |                  2|         -  |            -
    | *       -  |                   |*           +
    |      *    -|                   |    *    -  |
    |         - *|                   |      - *   |            b
    |      *-    |                   |   -        *
    | *   -      |                   |-       *   |
ACK2|   -        |               ACK1|    *       |            -
    | -          |                   |*           |            c
ACK1|            |               ACK2|            |            -


let a be time between original send and retransmit.
    b be time between retransmit and receipt of first acknowledgement
    c be time between receipt of first and second acknowledgements.
         - all obtained from simple observation at the sender.

elapsed overall _total_ RTT time for sum of both RTTs:
      1 + 2                          1 + 2
   = (a+b+c) + b                   = (a+b) + (b+c)
   = a + 2*b + c                   = a + 2*b + c
      ... is independent of ACK order.

So total RTT time for both round trips is fixed no matter what
the acknowledgement order. In retrospect, this is probably obvious,
but I had to draw it then check other cases... 

(and total area of triangles is fixed. This holds for any
 n packets[*] sent and n acknowledgments received, but I'm
 not drawing e.g. all six cases for n=3, where you end up with
 a + 2*b + 3*c + 2*d + e using the same interval labelling as above,
 and I'll stick to talking about n=2 below and generalising to the
 rest of the series based on this weird variant on Pascal's triangle
 later.)


We normally neglect RTTs on ACKs for resent packets, since you can't
be sure whether the original or the resent packet is being
acknowledged - this is Karn's algorithm. There, with a lost
packet/lost ACK, the overall time is unbounded, and we don't want to
introduce that uncertainty into our RTT calculations, so we
effectively discard the RTT for the ACK'd packet because we can't
separate it out easily.

But here, since both original and all resent packets are successfully
ACK'd, the overall sum of the two RTTs is certain. We can use that,
since we know the total time holds no matter what the order of
the ACKs; we have a valid summed sample time for the group of
transmissions and RTTs that may be useful for other purposes, and we
can build up an estimate of total-time-for-n-total-resends where n is
2 (or greater)  from subsequent samples. (If there are lots of
subsequent samples, there may be other problems, but that's beyond the
scope of this email.)

If the results of this observation are pretty much constant and you
see it occurring regularly, I suppose you may be seeing the effects of
occasional use of a second path with a different overall delay - one
possible explanation.

We could subtract the current 'normal' RTT estimate from the typical
computed total-RTT-for-an-extra-send (n=2) seen to get an estimate of
the RTT for the 'other case', be it a different route or something
else - assuming that the second occurrence is sporadic and is not
affecting 'normal' RTT (much). This could give a sample 'other delay'
of the other path, if there is another path.  Again, this is just one
possible cause; I haven't thought the range of causes through. 

Following the arithmetic series, in extreme (rarer?) cases where you
have three sends (a send and two retransmits) followed by three
unordered acknowledgements, you could then take the current RTT and
RTT-for-an-extra-send (n=2) estimates, add them together, and subtract
them from the (estimated seen total derived from sum RTT for n=3) to
get a rough idea of the delay there, and so on. This assumes that
normal RTT is tracking the most popular path/delay, etc, which is
probably assuming way too much.

[I'll note that if this is a long series and the differences in RTT 
 are clearly delineated, either you've got a very odd multiple path
 situation or there must be something wrong somewhere else, and
 it looks like a case of diminishing and error-prone RTT returns for
 increasing number of sends and ACK returns to me - at least as far
 as network conditions I'm familiar with should go. Arithmetic
 series only go so far in implementation.]

You will base RTO on _one_ of these computed RTTs to minimise
occurrence of multiple ACKs. If you're seeing a _lot_ of out-of-order
multiple-ACKs, it could be time to switch right to the another RTO
computed from another RTT group and see what happens; I can imagine
some sort of Markov-like state transitions, but there are probably
practical pitfalls I've overlooked in this abstract arithmetic.

Is there anything at all useful here?

thanks,

L.

I've no idea - I'm supposed to be working on multicast.
And I need to get out more.

[*] probably should be 'segments' since it's TCP, but since everyone
    else is saying 'packets' in this discussion I'll follow suit.

<L.Wood@surrey.ac.uk>PGP<http://www.ee.surrey.ac.uk/Personal/L.Wood/>


Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m107rqb-0002fha; Wed, 3 Feb 99 02:15 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma012101; Wed, 3 Feb 99 02:15:06 GMT
Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) 
          with ESMTP; Wed, 3 Feb 1999 02:14:21 +0000
Received: (from majordom@localhost)	by zephyr.isi.edu (8.8.7/8.8.6) 
          id QAA06976	for end2end-interest-outgoing;
          Tue, 2 Feb 1999 16:27:38 -0800 (PST)
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128])	by zephyr.isi.edu (8.8.7/8.8.6) 
          with ESMTP id QAA06971	for <end2end-interest@zephyr.isi.edu>;
          Tue, 2 Feb 1999 16:27:36 -0800 (PST)
Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50])	by tnt.isi.edu (8.8.7/8.8.6) 
          with ESMTP id QAA07989	for <end2end-interest@ISI.EDU>;
          Tue, 2 Feb 1999 16:27:35 -0800 (PST)
Received: (from floyd@localhost)	by owl.ee.lbl.gov (8.9.2/8.9.2) id QAA06775;
          Tue, 2 Feb 1999 16:27:35 -0800 (PST)
Message-Id: <199902030027.QAA06775@owl.ee.lbl.gov>
To: end2end-interest@ISI.EDU
Subject: Re: TCP and out-of-order delivery
Date: Tue, 02 Feb 1999 16:27:35 PST
From: Sally Floyd <floyd@ee.lbl.gov>
Precedence: bulk
Status: RO
X-Status: 

This is background for a thread that started out on
the pilc mailing list, and that now is migrating to
end2end-interest instead.  Followup should be on
end2end-interest, I would say.

- Sally


------- Forwarded Message

Date:    Tue, 02 Feb 1999 11:27:28 -0800
From:    Sally Floyd <floyd>
To:      karn@qualcomm.com
cc:      pilc@lerc.nasa.gov, Tom Henderson <tomh@cs.berkeley.edu>
Subject: Re: TCP and out-of-order delivery

Phil -

>The fast recovery mechanism in TCP has always bothered me a little
>precisely because its reaction to out-of-order delivery.  The Internet
>architecture has always permitted this, even if it doesn't happen all
>that often in practice (yet). Even without radio links, the increased
>use of load splitting will make this a problem worth solving at the
>TCP level.

My understanding is that the TCP-level solution to TCP's problems with
out-of-order delivery would go as follows:

(1) Add a recommendation to SACK so that, when the receiver receives a
packet that is already covered by the cumulative acknowledgement, the
receiver not only returns a dup ack (as it does now), but uses the
first SACK block to indicate the sequence numbers of the packet that
resulted in the dup ack.  If the receiver also holds some
non-contiguous blocks of data, they could be reported in the second and
third SACK blocks.

With this added recommendation (which would be optional, but which does
not violate anything in the SACK RFC 2018, I believe), the sender would
be able to reconstruct the exact sequence of data received by the
receiver.

(2) Consider a case where the sender receives three dup acks, infers
that packet N has been lost, enters Fast Recovery, and retransmits
packet N.  If packet N in fact was not lost, but was simply reordered,
then the sender will be able to detect this later on (assuming that
acknowledgements aren't lost on the return path).  Thus the sender can
determine that it did not need to retransmit packet N, and it did not need
to reduce its window in half.

If the sender determines after-the-fact that it did not need to
retransmit packet N, there is not much that it can do about it, except
perhaps to modify its dup ack threshold.  The bandwidth wasted by the
unnecessary retransmission of that packet has already been wasted.  But
the sender can do something when it determines that it unnecessarily
reduced its congestion window.  That is, the sender can put ssthresh
back to its old value (before it cut ssthresh in half when it initiated
Fast Recovery), and safely slow-start back up to its old congestion
window.


I think that this would be a simple and clean change to make to SACK
TCP, and would take care of some of the performance problems of TCP
with load splitting and other forms of strong out-of-order delivery.
(Though of couse it would need some exploration first.)

(There are also not necessarily original ideas, though I don't know
off-hand of any citations.  I have talked with a number of people about
this since the SACK RFC was first started, most recently with Tom
Henderson, who pointed out to me the problem with this proposal in the
absence of (1) above.)

- Sally

------- End of Forwarded Message



Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m107s4l-0002pNa; Wed, 3 Feb 99 02:29 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma012390; Wed, 3 Feb 99 02:29:33 GMT
Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) 
          with ESMTP; Wed, 3 Feb 1999 02:29:00 +0000
Received: (from majordom@localhost)	by zephyr.isi.edu (8.8.7/8.8.6) 
          id QAA07430	for end2end-interest-outgoing;
          Tue, 2 Feb 1999 16:39:41 -0800 (PST)
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128])	by zephyr.isi.edu (8.8.7/8.8.6) 
          with ESMTP id QAA07425	for <end2end-interest@zephyr.isi.edu>;
          Tue, 2 Feb 1999 16:39:40 -0800 (PST)
Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50])	by tnt.isi.edu (8.8.7/8.8.6) 
          with ESMTP id QAA09185	for <end2end-interest@ISI.EDU>;
          Tue, 2 Feb 1999 16:39:39 -0800 (PST)
Received: (from floyd@localhost)	by owl.ee.lbl.gov (8.9.2/8.9.2) id QAA06792;
          Tue, 2 Feb 1999 16:39:36 -0800 (PST)
Message-Id: <199902030039.QAA06792@owl.ee.lbl.gov>
To: end2end-interest@ISI.EDU
Subject: Re: TCP and out-of-order delivery
Date: Tue, 02 Feb 1999 16:39:36 PST
From: Sally Floyd <floyd@ee.lbl.gov>
Precedence: bulk
Status: RO
X-Status: 

...
>I think there is an opportunity for some form of special-case handling
>here to give an idea of what's causing the out-of-order delivery - and
>perhaps even an RTT estimate for it - which could be useful if there's
>a different path involved, due to load splitting, say.

Yep, I agree.

...
>But here, since both original and all resent packets are successfully
>ACK'd, the overall sum of the two RTTs is certain. We can use that,
>since we know the total time holds no matter what the order of
>the ACKs; we have a valid summed sample time for the group of
>transmissions and RTTs that may be useful for other purposes, and we
>can build up an estimate of total-time-for-n-total-resends where n is
>2 (or greater)  from subsequent samples. (If there are lots of
>subsequent samples, there may be other problems, but that's beyond the
>scope of this email.)

...
>Is there anything at all useful here?

I think so.  And at the very least, you can use the
n-transmissions-and-n-acks to calculate an average roundtrip time for
those n transmissions.  But unnecessarily-retransmitted packets will
hopefully not be a very frequent occurrence.  (If they are,
then the sender didn't do very well figuring out how to increase the
dup ack threshold...)

- Sally



Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m107txd-0002pNa; Wed, 3 Feb 99 04:30 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xmaa14388; Wed, 3 Feb 99 04:30:36 GMT
Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) 
          with ESMTP; Wed, 3 Feb 1999 04:30:10 +0000
Received: (from majordom@localhost)	by zephyr.isi.edu (8.8.7/8.8.6) 
          id SAA11010	for end2end-interest-outgoing;
          Tue, 2 Feb 1999 18:24:08 -0800 (PST)
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128])	by zephyr.isi.edu (8.8.7/8.8.6) 
          with ESMTP id SAA11005	for <end2end-interest@zephyr.isi.edu>;
          Tue, 2 Feb 1999 18:24:07 -0800 (PST)
Received: from cs.tamu.edu (0@clavin.cs.tamu.edu [128.194.130.106])	by tnt.isi.edu (8.8.7/8.8.6) 
          with ESMTP id SAA17794	for <end2end-interest@isi.edu>;
          Tue, 2 Feb 1999 18:24:06 -0800 (PST)
Received: from sun.cs.tamu.edu (sun [128.194.135.14])	by cs.tamu.edu (8.9.1/8.9.1) 
          with ESMTP id UAA02030;	Tue, 2 Feb 1999 20:22:51 -0600 (CST)
From: Nitin H Vaidya <vaidya@cs.tamu.edu>
Received: (from vaidya@localhost)	by sun.cs.tamu.edu (8.9.1/8.9.1) id UAA14094;
          Tue, 2 Feb 1999 20:16:35 -0600 (CST)
Date: Tue, 2 Feb 1999 20:16:35 -0600 (CST)
Message-Id: <199902030216.UAA14094@sun.cs.tamu.edu>
To: end2end-interest@ISI.EDU, floyd@ee.lbl.gov
Subject: Re: TCP and out-of-order delivery
Cc: vaidya@cs.tamu.edu
Precedence: bulk
Status: RO
X-Status: 

  >> From eep1lw@surrey.ac.uk Tue Feb  2 17:49:25 1999
  >> 
  >> On Tue, 2 Feb 1999, Sally Floyd wrote:
  >> [Probably veering off-topic for PILC, so I've cc'd to and set
  >>  followups to end2end as likely to be more appropriate.]
  >> 
  >> An out-of-order-delivery caused by a delay en route leads to a resend
  >> and the receipt of two acknowledgements - one for the original and one
  >> for the resend. 

For this sort of analysis to be valid, you would need to assume that
the first n acks you get correspond to first n transmits of a
packet. This may not be true, since n packets sent by TCP
sender may generate more than n acks, when a packet gets
duplicated on some hop on TCP route.

May be I am splitting hair ...

- nitin



Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m107txc-0002fha; Wed, 3 Feb 99 04:30 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma014388; Wed, 3 Feb 99 04:30:36 GMT
Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) 
          with ESMTP; Wed, 3 Feb 1999 04:30:01 +0000
Received: (from majordom@localhost)	by zephyr.isi.edu (8.8.7/8.8.6) 
          id SAA11726	for end2end-interest-outgoing;
          Tue, 2 Feb 1999 18:43:30 -0800 (PST)
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128])	by zephyr.isi.edu (8.8.7/8.8.6) 
          with ESMTP id SAA11721	for <end2end-interest@zephyr.isi.edu>;
          Tue, 2 Feb 1999 18:43:28 -0800 (PST)
Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50])	by tnt.isi.edu (8.8.7/8.8.6) 
          with ESMTP id SAA19078	for <end2end-interest@isi.edu>;
          Tue, 2 Feb 1999 18:43:28 -0800 (PST)
Received: (from floyd@localhost)	by owl.ee.lbl.gov (8.9.2/8.9.2) id SAA06913;
          Tue, 2 Feb 1999 18:43:25 -0800 (PST)
Message-Id: <199902030243.SAA06913@owl.ee.lbl.gov>
To: Nitin H Vaidya <vaidya@cs.tamu.edu>
cc: end2end-interest@ISI.EDU
Subject: Re: TCP and out-of-order delivery
Date: Tue, 02 Feb 1999 18:43:25 PST
From: Sally Floyd <floyd@ee.lbl.gov>
Precedence: bulk
Status: RO
X-Status: 

>For this sort of analysis to be valid, you would need to assume that
>the first n acks you get correspond to first n transmits of a
>packet. This may not be true, since n packets sent by TCP
>sender may generate more than n acks, when a packet gets
>duplicated on some hop on TCP route.

Yep.  You don't know *for sure* that the n acks aren't due to
duplicated packets.  And (unicast) packets do sometimes get duplicated
in the network, as Vern shows in his measurement studies.

If it was the original packet that got delayed and also duplicated,
then that is fine - it is still true that your assumption that the
original packet was dropped was an incorrect assumption.  (Even if, for
some reason, you haven't actually received a valid ack for the
retransmitted packet.)

If the original packet was not duplicated, but instead the original
packet was itself lost, and it was the retransmission that was
duplicated, then either you have been seeing lots of dup acks all
along, and you know to be cautious; or else you have been fooled by a
somewhat unlikely combination of events.  Because, if packets aren't
being duplicated rather frequently, I can't see any reason why the
network, when it choose to duplicate a packet, would duplicate one of
the few packets that was a retransmission of an earlier packet.
Barring adversaries in the network, which I assume that we can ignore
for the time being...

Convincing?  Or not?

- Sally





Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m1083XV-0002u0a; Wed, 3 Feb 99 14:44 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma003501; Wed, 3 Feb 99 14:44:24 GMT
Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) 
          with ESMTP; Wed, 3 Feb 1999 14:43:56 +0000
Received: (from majordom@localhost)	by zephyr.isi.edu (8.8.7/8.8.6) 
          id EAA04665	for end2end-interest-outgoing;
          Wed, 3 Feb 1999 04:25:15 -0800 (PST)
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128])	by zephyr.isi.edu (8.8.7/8.8.6) 
          with ESMTP id EAA04660	for <end2end-interest@zephyr.isi.edu>;
          Wed, 3 Feb 1999 04:25:13 -0800 (PST)
Received: from ainur.ee.surrey.ac.uk (root@ainur.ee.surrey.ac.uk [131.227.50.25])	by tnt.isi.edu (8.8.7/8.8.6) 
          with SMTP id EAA16367	for <end2end-interest@ISI.EDU>;
          Wed, 3 Feb 1999 04:25:11 -0800 (PST)
Received: from petra.ee.surrey.ac.uk by ainur.ee.surrey.ac.uk 
          with smtp	(Smail3.1.29.1-ident) id m1081M7-0002ghC;
          Wed, 3 Feb 99 12:24 GMT
Date: Wed, 3 Feb 1999 12:24:31 +0000 (GMT)
From: Lloyd Wood <L.Wood@surrey.ac.uk>
X-Sender: eep1lw@petra.ee.surrey.ac.uk
Reply-To: Lloyd Wood <L.Wood@surrey.ac.uk>
To: Sally Floyd <floyd@ee.lbl.gov>
cc: Nitin H Vaidya <vaidya@cs.tamu.edu>, end2end-interest@ISI.EDU
Subject: Re: TCP and out-of-order delivery
In-Reply-To: <199902030243.SAA06913@owl.ee.lbl.gov>
Message-ID: <Pine.SOL.3.95.990203114933.6640D-100000@petra.ee.surrey.ac.uk>
Organization: speaking for none
X-URL: http://www.ee.surrey.ac.uk/Personal/L.Wood/
X-no-archive: yes
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Precedence: bulk
Status: RO
X-Status: 

On Tue, 2 Feb 1999, Sally Floyd wrote:
> Nitin Vaidya wrote:
> >For this sort of analysis to be valid, you would need to assume that
> >the first n acks you get correspond to first n transmits of a
> >packet. This may not be true, since n packets sent by TCP
> >sender may generate more than n acks, when a packet gets
> >duplicated on some hop on TCP route.

Well, this is after-the-fact analysis, so you can check for more than
n ACKs received for the n packet transmissions before treating the
sample as valid.

You will stop counting transmissions of the packet when the first
acknowledgement comes in and you no longer need to resend the packet.
That gives you a value of n for the n packet transmissions. 

You can have a safe timeout on counting acknowledgements. Once you've
reached n acknowledgements, you won't stop counting (although you can
do the math and get other RTT estimates at that time); you will wait
for the duration of that timeout for any duplicate ACKs to appear
before treating this sample and computed estimates as valid. 

If any more ACKs do appear, then you now have n sends and n+1 ACKs, so
the arithmetic to get other possible RTT estimates doesn't apply, as
you're back to the uncertain case of the Karn algorithm (and you won't
care about checking for the arrival of any more dup ACKs for the
packet, since that's still a Karn case and to be ignored.) 

To get the same value of n for _both_ transmissions and ACKs where
ACKs are duplicated and the analysis I came up with previously is
invalidated, you need the total number of duplicated ACKs to match the
total number of lost packets/lost ACKs:  for every duplicated ACK
received over and above the first ACK for that particular packet
transmission, there must be an equivalent lost packet or lost ACK so
that the total number of ACKs received by the sender remains at n - or
all the extra ACKs are arriving after that safe timeout.

I agree with Sally that this scenario is extremely unlikely without
other duplications being visible at other times. 

So, now we have a useful sum-of-n-RTTs value for our unordered
multiple ACKs, what can we reliably conclude from it about multipath
RTTs/other causes? 

thanks,

L.

> Yep.  You don't know *for sure* that the n acks aren't due to
> duplicated packets.  And (unicast) packets do sometimes get duplicated
> in the network, as Vern shows in his measurement studies.
> 
> If it was the original packet that got delayed and also duplicated,
> then that is fine - it is still true that your assumption that the
> original packet was dropped was an incorrect assumption.  (Even if, for
> some reason, you haven't actually received a valid ack for the
> retransmitted packet.)
> 
> If the original packet was not duplicated, but instead the original
> packet was itself lost, and it was the retransmission that was
> duplicated, then either you have been seeing lots of dup acks all
> along, and you know to be cautious; or else you have been fooled by a
> somewhat unlikely combination of events.  Because, if packets aren't
> being duplicated rather frequently, I can't see any reason why the
> network, when it choose to duplicate a packet, would duplicate one of
> the few packets that was a retransmission of an earlier packet.
> Barring adversaries in the network, which I assume that we can ignore
> for the time being...
> 
> Convincing?  Or not?
> 
> - Sally

<L.Wood@surrey.ac.uk>PGP<http://www.ee.surrey.ac.uk/Personal/L.Wood/>





Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m1086X8-0002swa; Wed, 3 Feb 99 17:56 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma011040; Wed, 3 Feb 99 17:56:08 GMT
Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) 
          with ESMTP; Wed, 3 Feb 1999 17:55:44 +0000
Received: (from majordom@localhost)	by zephyr.isi.edu (8.8.7/8.8.6) 
          id HAA10702	for end2end-interest-outgoing;
          Wed, 3 Feb 1999 07:52:41 -0800 (PST)
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128])	by zephyr.isi.edu (8.8.7/8.8.6) 
          with ESMTP id HAA10697	for <end2end-interest@zephyr.isi.edu>;
          Wed, 3 Feb 1999 07:52:41 -0800 (PST)
Received: from cs.tamu.edu (0@clavin.cs.tamu.edu [128.194.130.106])	by tnt.isi.edu (8.8.7/8.8.6) 
          with ESMTP id HAA23110	for <end2end-interest@isi.edu>;
          Wed, 3 Feb 1999 07:52:40 -0800 (PST)
Received: from sun.cs.tamu.edu (sun [128.194.135.14])	by cs.tamu.edu (8.9.1/8.9.1) 
          with ESMTP id JAA24473;	Wed, 3 Feb 1999 09:51:58 -0600 (CST)
From: Nitin H Vaidya <vaidya@cs.tamu.edu>
Received: (from vaidya@localhost)	by sun.cs.tamu.edu (8.9.1/8.9.1) id JAA14807;
          Wed, 3 Feb 1999 09:45:36 -0600 (CST)
Date: Wed, 3 Feb 1999 09:45:36 -0600 (CST)
Message-Id: <199902031545.JAA14807@sun.cs.tamu.edu>
To: floyd@ee.lbl.gov
Subject: Re: TCP and out-of-order delivery
Cc: end2end-interest@ISI.EDU
Precedence: bulk
Status: RO
X-Status: 


  >> From floyd@ee.lbl.gov Tue Feb  2 21:40:34 1999
  >> 
  >> If the original packet was not duplicated, but instead the original
  >> packet was itself lost, and it was the retransmission that was
  >> duplicated, then either you have been seeing lots of dup acks all
  >> along, and you know to be cautious; or else you have been fooled by a
  >> somewhat unlikely combination of events.  Because, if packets aren't
  >> being duplicated rather frequently, I can't see any reason why the
  >> network, when it choose to duplicate a packet, would duplicate one of
  >> the few packets that was a retransmission of an earlier packet.
  >> Barring adversaries in the network, which I assume that we can ignore
  >> for the time being...
  >> 
  >> Convincing?  Or not?


It is convincing ... as I said earlier, I might have been nitpicking.
I do find the suggestion by Lloyd Wood quite interesting,
though somewhat complicated. I think that it is useful to know
under what assumptions the idea would work as intended.
Other situations exist where reordering can occur due to
reasons other than multipath (e.g., out-of-order delivery by
a link layer). This, and packet duplication, may be rare events,
hopefully rare enough compared to multipath that they can be
ignored.

Regards.

- nitin



 
Received: by ainur.ee.surrey.ac.uk (Smail3.1.29.1 #8)
	id m1086Y8-0002tTa; Wed, 3 Feb 99 17:57 GMT
Received: from maile.surrey.ac.uk(131.227.102.10) by ainur.ee.surrey.ac.uk via smap (V2.0)
	id xma011078; Wed, 3 Feb 99 17:56:55 GMT
Received: from zephyr.isi.edu by maile.surrey.ac.uk with SMTP-external (PP) 
          with ESMTP; Wed, 3 Feb 1999 17:56:30 +0000
Received: (from majordom@localhost)	by zephyr.isi.edu (8.8.7/8.8.6) 
          id IAA11080	for end2end-interest-outgoing;
          Wed, 3 Feb 1999 08:01:18 -0800 (PST)
Received: from tnt.isi.edu (tnt.isi.edu [128.9.128.128])	by zephyr.isi.edu (8.8.7/8.8.6) 
          with ESMTP id IAA11075	for <end2end-interest@zephyr.isi.edu>;
          Wed, 3 Feb 1999 08:01:16 -0800 (PST)
Received: from owl.ee.lbl.gov (owl.ee.lbl.gov [131.243.1.50])	by tnt.isi.edu (8.8.7/8.8.6) 
          with ESMTP id IAA23664	for <end2end-interest@isi.edu>;
          Wed, 3 Feb 1999 08:01:16 -0800 (PST)
Received: (from floyd@localhost)	by owl.ee.lbl.gov (8.9.2/8.9.2) id IAA07343;
          Wed, 3 Feb 1999 08:01:12 -0800 (PST)
Message-Id: <199902031601.IAA07343@owl.ee.lbl.gov>
To: Nitin H Vaidya <vaidya@cs.tamu.edu>
cc: end2end-interest@ISI.EDU
Subject: Re: TCP and out-of-order delivery
Date: Wed, 03 Feb 1999 08:01:12 PST
From: Sally Floyd <floyd@ee.lbl.gov>
Precedence: bulk
Status: RO
X-Status: 

>It is convincing ... as I said earlier, I might have been nitpicking.
>I do find the suggestion by Lloyd Wood quite interesting,
>though somewhat complicated. 

Yes.  In my email, I was really only addressing the issue of how, even
though packets can get duplicated in the network, you can still
reasonably infer that 2-acks-for-2-transmissions means that you
retransmitted the packet unnecessarily.  (I wasn't addressing the
complications of using the acks for RTT measurements...)

- Sally

 
Date: Sat, 6 Feb 1999 01:04:21 +0000 (GMT)
From: Lloyd Wood <eep1lw@surrey.ac.uk>
X-Sender: eep1lw@petra.ee.surrey.ac.uk
Reply-To: Lloyd Wood <L.Wood@surrey.ac.uk>
To: Sally Floyd <floyd@ee.lbl.gov>
cc: Nitin H Vaidya <vaidya@cs.tamu.edu>, end2end-interest@ISI.EDU
Subject: Re: TCP and out-of-order delivery
In-Reply-To: <199902031601.IAA07343@owl.ee.lbl.gov>
Message-ID: <Pine.SOL.3.95.990206000001.13118T-100000@petra.ee.surrey.ac.uk>
Full-Name: Lloyd Wood
Organization: speaking for none
X-URL: http://www.ee.surrey.ac.uk/Personal/L.Wood/
X-no-archive: yes
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Status: O
X-Status: 

On Wed, 3 Feb 1999, Sally Floyd wrote:
> Nitin Vaidya wrote:
> >It is convincing ... as I said earlier, I might have been nitpicking.
> >I do find the suggestion by Lloyd Wood quite interesting,
> >though somewhat complicated. 
> 
> Yes.  In my email, I was really only addressing the issue of how, even
> though packets can get duplicated in the network, you can still
> reasonably infer that 2-acks-for-2-transmissions means that you
> retransmitted the packet unnecessarily.  (I wasn't addressing the
> complications of using the acks for RTT measurements...)

Speaking of such complications...

Using RTTM via timestamps as described in RFC1323 should make both
Karn's algorithm and any marginal special-case improvements to it from
implementing my observation completely redundant, since you should
always know which ACK belongs to which packet from looking at the
echoed timestamps.

However, if your receiver isn't RFC1323-compliant and doesn't do
timestamps, you must fall back on Karn's algorithm for RTT estimation. 
In that case, this observation should be worth considering as offering
a marginal measurement improvement, since it can give you more RTT
samples and some idea of how RTT is changing. 

(I imagine there are lots of those legacy hosts around, so if
 you're communicating with one via a multipath mesh mobile
 network where the route and path delay keep switching...)

I got quite excited when I read section 3.3 of Karn and Partridge's
1987 SIGCOMM paper 'Improving Round-Trip Time Estimates in Reliable
Transport Protocols' (there's a copy available from the bottom of 
http://people.qualcomm.com/karn/papers/index.html ).

That section appears to describe a situation that this observation
can, I believe, improve by giving better immediate estimates of RTT
and slightly finer input to RTO to help it converge faster (not that
you want RTO to be too finely controlled...) 

Bob Braden's expired 'TCP Extensions for High Performance: An Update'
draft was very helpful in clarifying this and the more obscure parts
of RFC1323 for me. (Stevens has a copy usefully archived at: 
 http://www.kohala.com/~rstevens/tcplw-extensions.txt )

I'm vaguely hoping someone might find this useful and run with it with
simulations etc., even though my suggestion would have been more
immediately useful, oh, twelve years back. 

Cheers,

L.

<L.Wood@surrey.ac.uk>PGP<http://www.ee.surrey.ac.uk/Personal/L.Wood/>