A Discussion of MPE TCP Timers

As of: July 11, 2011

In NMMGR on the NETXPORT.GPROT.TCP screen, one finds the following default (as delivered from HP) values:

  [N]       Checksum Enabled (Y For Yes, N For No)
  [1024 ]   Maximum Number of Connections
  [4]       Retransmission Interval Lower Bound (Secs)
  [180  ]   Maximum Time to Wait For Remote Response (Sec)
  [5  ]     Initial Retransmission Interval (Secs)
  [4  ]     Maximum Retransmissions per Packet

These values control how a 3000 reacts in the event it needs to re-send (retransmit) a packet (“chunk”) of data over a TCP/IP network. These values were established at least in the MPE V days (and possibly before that) – back when only big, important computers were trying to talk to each other. (Unlike today when even your refrigerator thinks it needs to “yack it up” over the internet!) The important thing to understand about these values is that they are perfectly fine and do not need changing because they are never (or rarely) used on an optimally-performing network. However, given that…

  1. These days, networks rarely perform optimally, and
  2. HP Network Engineers have described the above values as “way out of whack”

…then you should change your TCP values to:

  [Y]       Checksum Enabled (Y For Yes, N For No)
  [4096 ]   Maximum Number of Connections
  [1  ]     Retransmission Interval Lower Bound (Secs)
  [360  ]   Maximum Time to Wait For Remote Response (Sec)
  [2  ]     Initial Retransmission Interval (Secs)
  [8  ]     Maximum Retransmissions per Packet

…and go solve more important problems!

However, for those needing a more detailed explanation….

The core concept of Transmission Control Protocol is – control. It means that TCP makes every attempt to guarantee, via its controls, that the stuff “I” send to “you” over the network actually arrives. When you receive something from me, you politely send a receipt acknowledgement – and one of the controls mechanisms is satisfied. Another of TCP’s assurance features includes the ability to resend data. This means, in the event that you didn’t get some of what I sent, you simply need to say that you didn’t get it and I’ll gladly send (retransmit) it again.

TCP transmission (“I am sending stuff to you”) timers come into play on a congested network. They establish the bounds (or limits) for a given system. Consider the following from Douglas Comer’s authoritative text on TCP/IP:

Congestion is a condition of severe delay caused by an overload of datagrams at one or more switching points (e.g., at routers). When congestion occurs, delays increase and the router begins to enqueue datagrams until it can route them. We must remember that each router has finite storage capacity and that datagrams compete for that storage (i.e., in a datagram based internet, there is no pre-allocation of resources to individual TCP connections). In the worst case, the total number of datagrams arriving at the congested router grows until the router reaches capacity and starts to drop datagrams.

Mr. Comer’s reference to “switching points” can actually be expanded to the system sending data, the system receiving data and any network gear (e.g., hubs, routers, switches) sitting in-between. Doesn’t that create a mental picture of mounds of dropped data (grams) lying all over the computer room floor?

Going back to the default MPE TCP timers, consider the following scenario – an MPE system is starting a network connection to another system but receives noack(nowledgment) from the other system:

Original packet transmission... - no ACK...
Wait   5.0 seconds
       (per the Initial Retransmission Interval value)
Time:  5.0 s:  retry# 1
       (of 5 per the Maximum Retransmissions per Packet value)
Wait   10.0 seconds
       (because the wait time doubles per TCP’s rules)
Time:  15.0 s:  retry# 2
Wait   20.0 seconds
Time:  35.0 s:  retry# 3
Wait   40.0 seconds
Time:  75.0 s:  retry# 4
Wait   80.0 seconds

The last TCP retry packet is sent 75.0 seconds after the original packet! The time to connection failure reporting is 155.0 seconds. In other words, it would take over 2 and a half minutes before the remote command returns with an error! That is a loooooonnnnnnngggg time when waiting and likely beyond the patience limit of most interactive users.

With the updated TCP Timers, a “Retransmission Interval Lower Bound ” of [1] gives the best performance in cases where single retransmissions are necessary and a “Maximum Retransmissions per Packet” of [8] gives the enough retransmissions to successfully keep a TCP connection going over a network that is not consistent in its performance or quality characteristics. So using the same scenario as above, our updated timers would result in the following:

Original packet transmission... - no ACK...
Wait    2.0 seconds
Time:   2.0 s:  retry# 1
Wait    4.0 seconds
Time:   6.0 s:  retry# 2
Wait    8.0 seconds
Time:  14.0 s:  retry# 3
Wait   16.0 seconds
Time:  30.0 s:  retry# 4
Wait   32.0 seconds
Time:  62.0 s:  retry# 5
Wait   64.0 seconds
Time: 126.0 s:  retry# 6
Wait  128.0 seconds
Time: 254.0 s:  retry# 7
Wait  256.0 seconds
Time: 510.0 s:  retry# 8
Wait  512.0 seconds

Our time to wait increased to over 8 minutes. However, it’s far more likely that we’ll never reach the 8th retry because we’ve requested retries as frequently as allowed on MPE systems.

Additional Notes

The disabled Checksum field [N] is another carry-over from olden times. Enabling the checksum function, while it does create some minute overhead, creates a TCP header that is RFC compliant and in-line with other systems’ behavior.

Most systems these days benefit from increasing the number of connections to some value greater than 1024. (We recommend 4096). If you are encountering VT Error 39’s – this is a sign that this value needs to be increased. Do not, however, arbitrarily increase this number to a large value (e.g., 20000) without direction. This value uses real memory and too large a value can cause the system to be unable to start networking.

Summary

Between high-speed servers on the same network, these timers rarely come into play. Where these timers – and their tuning – are important is with interactive users utilizing PC’s, or servers spread-out over a WAN or even the Internet. The very nature of these connections – with jerky starts/stops or heavy traffic – demand more robust network performance, which the updated timers provide.

Acknowledgements

Allegro wishes to gratefully recognize the following people for their invaluable contributions to this paper:

  • Eero Laurila and James Hofmeister – Hewlett-Packard MPE Network Engineers.
  • Douglas Comer’s “Internetworking with TCP/IP Volume 1: Principles, Protocols and Architectures (4th Edition)”.
  • Charles Kozierok’s “The TCP Guide (Version 3)”.