Tech Note 0036

MTP Performance Statistics

Understanding MTP Statistics Reports

MTP collects a variety of statistics about each transaction which can be useful for diagnosing network, storage, or CPU problems.  When diagnostic logging is enabled, the receiving end of an MTP transaction will report these statistics in the log output at the end of the transaction.  Here is an example:

MTP1: 4277D235 Statistics MTP1: Sequence: 524288000 Received: 524288000 Speed: 880644125/ 920589052 5% MTP1: Wsize: 8249472/9767296 Transit: 1408 Datagram: 1408/1408 MTP1: RTT: 65/63/63/76/101 Wtime: 0.075s RDh: 15 Rnh: 261888 Rpt: 152 Gap: 63 MTP1: Req: 2292 Resp: 372432 Del: 45 Rptd: 45 Dup: 0 MTP1: Wait: 0 Stall: 0 Pause: 0 Slow: 0 Over: 0 Loss: 0.020/0.000 MTP1: BWDP: 52 Flow: 121981 RTNR: 0 Gaps: 1 Lower: 45 Raise: 372431

Each MTP update may make changes to the format and meaning of this diagnostic output.  This Tech Note provides an overview for MTP version 4.3.2.  Contact support for an analysis of your particular logs.

Transaction ID

MTP1: 4277D235 Statistics

The first line provides the MTP transaction ID.  MTP transactions are the communication primitives used to build more complex interactions like uploads and downloads.  For example, a typical ExpeDat file transfer consists of two or three MTP transactions, some on the sender and some on the receiver.  The transaction of interest will be the one on the receiving side, typically moving a significant amount of data.

For most applications, you can map the MTP transaction ID to a DOC transaction ID by looking for a line like the following at the start:

DOC1: 37298166 Initiated Upload 4277D235 (type 3) to 107.170.213.228:43212 @ 4766738093

Most MTP applications use the DOC ID to identify their own transactions.  For example, ExpeDat identifies the completion of the above transaction like this:

F 20210317 20:38:26.081 37298166 SEND 107.170.213.228:43212 dei - "/tmp/test500mb.dat" "" 524288000 (0,0) "8.76 seconds (479 megabits/sec)"

Remember that MTP Statistics are only reported on the receiving side, so uploads will appear on the server and downloads will appear on the client.

Data Transfer

The second line shows how much data transfered and how fast.

MTP1: Sequence: 524288000 Received: 524288000 Speed: 880644125/ 920589052 5%

Sequence counts the number of bytes successfully delivered to the parent application.  Received counts the total number of bytes received, including out-of-sequence data which may be buffered but not yet utilized.  For most application purposes, Sequence is the important number.

Speed is the data transfer rate in bits per second averaged over about 20 milliseconds.  The first value was measured at the end of the transfer.  The second value is the peak.  For diagnostic purposes, the peak value offers insight into the capabilities of the infrastructure.  Both values have an error margin of ± 5%.

Pipeline

MTP1: Wsize: 8249472/9767296 Transit: 1408 Datagram: 1408/1408

Wsize describes the ending and peak numbers of bytes which should pipelined on the network path at any given time.  A large difference in values here indicates throughput variation over the life of the transfer.

Transit is the number of bytes pending at the end of the transaction.  This is typically only significant when a transfer ends with an abrupt error.

Datagram describes the current and maximum number of payload bytes in each UDP datagram.  The full IP datagram size is typically 56 bytes larger than this value.  Values above 1408 indicate attempts to utilize jumbo datagrams, which requires a compatible data path.

Path Latency

MTP1: RTT: 65/63/63/76/101 Wtime: 0.075s RDh: 15 Rnh: 261888 Rpt: 152 Gap: 63

Latency is measured in a variety of ways which reflect changing conditions on the network path:

On saturated paths without QoS management, the RTT highs are often significantly higher than the lows.  RTT values which show little variation indicate that the network path is not saturated and other factors are limiting flow.

Wtime is simply the ratio of Wsize to speed and should correlate to the average RTT.

RDh and Rnh are measures of MTP's maximum buffer depth, in milliseconds and bytes.  Large values here indicate a limitation of CPU capacity.

Rpt is the baseline number of milliseconds MTP will wait for a datagram before declaring it lost.  The actual timeout varies depending on circumstances and the baseline may be affected by various MTP performance options.

Gap measures the longest interval without receiving any data.  This should be close the minimum RTT.  Larger values indicate times when no data is getting through the network.

Error Recovery

MTP1: Req: 2292 Resp: 372432 Del: 45 Rptd: 45 Dup: 0

Req counts the number of request datagrams sent from the receiver to the sender and Resp counts the number of payload datagrams received.

Del counts how many requests did not receive a complete response on time and Rptd counts how many of those were ultimately considered lost and had to be repeated.  It is possible for a datagram to be lost and repeated multiple times.  Dup counts datagrams which arrive more than once.  Ideally, these values should be very low or zero.

Flow Control

MTP1: Wait: 0 Stall: 0 Pause: 0 Slow: 0 Over: 0 Loss: 0.020/0.000

Wait, Stall, and Pause count underflow and overflow events related to dynamic or streaming transactions such as ExpeDat's Streaming Folders or when downloading from Object Storage with CloudDat.

Wait counts the number of times the flow of data was stopped by the sending application due to a lack of data.  A value larger than zero or one indicates that the data source could not keep up with the network.

Stall is similar to Wait, counting the number of times the server had some data available, but not as much as the network could handle.

Pause is the opposite of Wait: it counts how many times the receiving application stopped the flow of data because it could not be delivered fast enough.

Slow and Over track the condition of MTP's internal datagram buffering.  Slow counts the number of times the sender requested a slowdown due to overflowing buffers.  A non-zero value indicates that the sender is overloaded.  Over counts the number of time this receiver's buffers became too large.  This can happen if MTP is targeting filesystem storage that is unable to keep up with the network.

Loss is actually related to the error recovery statistics on the previous line.  The first value is the ratio of Rptd to Req and is often an upper bound for packet loss.  The second value is the ratio of bytes lost to bytes received and is often a lower bound for packet loss.  Ideally these numbers should be close to zero.  A value above 0.02 for the second number indicates particularly high packet loss on the network.

More Flow Control

MTP1: BWDP: 52 Flow: 121981 RTNR: 0 Gaps: 1 Lower: 45 Raise: 372431

BWDP, Flow, and RTNR count the number of times MTP declined to increase the transfer rate due to various internal statistics.

Gaps counts the number of times no data arrived for more than half an RTT.  Values much larger than one indicate an unstable network.  When accompanied by high Loss and Gap, it may indicate intermittent loss of connectivity.

Lower and Raise count the number of times MTP considered lowering or raising the rate of low.  They are often close to the Req and Resp values.

Analysis

These statistics can be used to diagnose a variety of network conditions and identify the causes of performance issues.  For example, high Loss values accompanied by high Gaps suggest connectivity problems.  High Loss accompanied by flat (very similar) RTT values suggests QoS throttling by a router.  Many Wait, Stall, Pause, Slow, or Over events indicate storage problems at the source or destination.

DEI engineers can use these statistics, along with other diagnostic logs, to help isolate bottlenecks in the the network, storage, or CPU.  See Tech Note 0033 for instructions on collecting and submitting logs for various DEI applications then contact support for analysis.

Tech Note History

Mar182021MTP 4.3.2