PTP Guide

Open in Claude

PTP Profiles Guide

Overview

The Ouster sensor’s PTP profile must match your master clock’s profile for time synchronization to work. This guide explains how to set the profile.

PTP Profiles

There are several PTP profiles that are commonly used. The supported profiles on the Ouster sensor are listed below: The supported profiles on the Ouster sensor are:

  • default — IEEE 1588 Default profile. Supported by most PTP-capable devices.
  • gptp — IEEE 802.1AS-2011 (gPTP). Simplifies PTP for improved interoperability. Use with gPTP-compatible hardware such as AVB devices, e.g., the MOTU AVB.
  • automotive-slave — Common in automotive applications. BMCA is disabled, the slave suppresses announce messages, and the convergence controller is ~8× faster than the Default profile.

PTP HTTP API

Change the PTP profile with an HTTP PUT request using any HTTP client (HTTPie, curl, etc.). The full API is in http-v1-get-ptp-profile.

The request URL is: http://<sensor_hostname>/api/v1/time/ptp/profile/

  • Valid values are (““, are included):

  • “default”

  • “gptp”

  • “automotive-slave”

Note: Changing the PTP profile takes effect immediately on a valid PUT response. No reinitialization or config file write is required.

Enabling the PTP profiles

Below are some examples using popular command-line tools.

Example using cURL

Set the PTP profile to "gptp" using cURL:

Command:

$curl -X PUT -H "Content-Type: application/json" -d '"gptp"' http://<sensor_hostname>/api/v1/time/ptp/profile/

Response:

"gptp"%

Example using HTTPie

Set the PTP profile to "default" using HTTPie:

Command:

$http PUT http://<sensor_hostname>/api/v1/time/ptp/profile <<< '"default"'

Response:

"default"%

Sync Verification

Please see the sensor-ptp-sync-verify section for details on how to verify the sensor is synchronized.

PTP Quickstart Guide

This guide covers basic PTP network configuration using Ubuntu 18.04. It includes settings for a commercial PTP grandmaster clock and instructions for configuring a Linux machine as a PTP grandmaster.

The linuxptp project provides a suite of PTP tools that can be used to serve as a PTP master clock for a local network of sensors.

Assumptions

  • Command line Linux knowledge (e.g., package management, command line familiarity, etc.).
  • Ethernet interfaces that support hardware timestamping.
  • Ubuntu 18.04 is assumed for this tutorial, but any modern distribution should suffice.
  • Knowledge of systemd service configuration and management.
  • Familiarity with Linux permissions.

Physical Network Setup

Connect the sensor to the PTP grandmaster with at most one network switch. A direct connection is ideal. A single layer-2 gigabit Ethernet switch is acceptable. Multiple switches add jitter and are not recommended.

Third Party Grandmaster Clock

Use a dedicated grandmaster clock with a GPS receiver for highest absolute accuracy.

It must be configured with the following parameters which match the linuxptp client defaults:

  • Transport: UDP IPv4
  • Delay Mechanism: E2E
  • Sync Mode: Two-Step
  • Announce Interval: 1 - sent every 2 seconds
  • Sync Interval: 0 - sent every 1 second
  • Delay Request Interval: 0 - sent every 1 second

For more settings, review the port_data_set field returned from the sensor’s HTTP /time/ptp interface.

Linux PTP Grandmaster Clock

When absolute accuracy allows, run a local Linux PTP master clock instead of an external grandmaster. This is commonly done on vehicle computers that interface directly with lidar sensors.

This section outlines how to configure a master clock.

Example Network Setup

This section assumes the following network setup as it has elements of a local master clock and the option for an upstream PTP time source.

+-------------------------------------+
| Ubuntu 18.04 System |
| * 2x Intel i210 Ethernet Interfaces |
| * Linux PTP service |
| |
| eno1 eno2 |
+-------+---------------------+-------+
| |
+-------+-------+ +--------+-------+
| Trimble GM100 | | |
| GPS -> PTP | | Ouster OS1 |
| grandmaster | | |
| (optional) | | |
+---------------+ +----------------+

The focus is configuring the Linux PTP service to distribute a common clock to all downstream Ouster OS1 sensors, using the Ubuntu host system time.

Optionally, add a grandmaster clock to discipline the Linux system time.

Installing Necessary Packages

Install the following packages for PTP functionality:

  • linuxptp - Linux PTP package with the following components:

    • ptp4l daemon to manage hardware and participate as a PTP node
    • phc2sys to synchronize the Ethernet controller’s hardware clock to the Linux system clock or shared memory region
    • pmc to query the PTP nodes on the network.
  • chrony - A NTP and PTP time synchronization daemon. It can be configured to listen to both NTP time sources via the Internet and a PTP master clock such as one provided by a GPS with PTP support. This will validate the time configuration makes sense given multiple time sources.

  • ethtool - A tool to query the hardware and driver capabilities of a given Ethernet interface.

$$ sudo apt update
$...
$Reading package lists... Done
$Building dependency tree
$Reading state information... Done
$
$$ sudo apt install linuxptp chrony ethtool
$Reading package lists... Done
$Building dependency tree
$Reading state information... Done
$The following NEW packages will be installed:
$ chrony ethtool linuxptp
$0 upgraded, 3 newly installed, 0 to remove and 29 not upgraded.
$Need to get 430 kB of archives.
$After this operation, 1,319 kB of additional disk space will be used.
$Get:1 http://us.archive.ubuntu.com/ubuntu bionic/main amd64 ethtool amd64 1:4.15-0ubuntu1 [114 kB]
$Get:2 http://us.archive.ubuntu.com/ubuntu bionic/universe amd64 linuxptp amd64 1.8-1 [112 kB]
$Get:3 http://us.archive.ubuntu.com/ubuntu bionic-updates/main amd64 chrony amd64 3.2-4ubuntu4.2 [203 kB]
$Fetched 430 kB in 1s (495 kB/s)
$Selecting previously unselected package ethtool.
$(Reading database ... 117835 files and directories currently installed.)
$Preparing to unpack .../ethtool_1%3a4.15-0ubuntu1_amd64.deb ...
$Unpacking ethtool (1:4.15-0ubuntu1) ...
$Selecting previously unselected package linuxptp.
$Preparing to unpack .../linuxptp_1.8-1_amd64.deb ...
$Unpacking linuxptp (1.8-1) ...
$Selecting previously unselected package chrony.
$Preparing to unpack .../chrony_3.2-4ubuntu4.2_amd64.deb ...
$Unpacking chrony (3.2-4ubuntu4.2) ...
$Setting up linuxptp (1.8-1) ...
$Processing triggers for ureadahead (0.100.0-20) ...
$ureadahead will be reprofiled on next reboot
$Setting up chrony (3.2-4ubuntu4.2) ...
$Processing triggers for systemd (237-3ubuntu10.13) ...
$Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
$Setting up ethtool (1:4.15-0ubuntu1) ...

Ethernet Hardware Timestamp Verification

Identify the Ethernet interface (e.g., eno1) and run ethtool to confirm hardware timestamping support:

Output of ethtool -T for a functioning Intel i210 Ethernet interface:

$$ sudo ethtool -T eno1
$Time stamping parameters for eno1:
$Capabilities:
$ hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE)
$ software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE)
$ hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE)
$ software-receive (SOF_TIMESTAMPING_RX_SOFTWARE)
$ software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
$ hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE)
$PTP Hardware Clock: 0
$Hardware Transmit Timestamp Modes:
$ off (HWTSTAMP_TX_OFF)
$ on (HWTSTAMP_TX_ON)
$Hardware Receive Filter Modes:
$ none (HWTSTAMP_FILTER_NONE)
$ all (HWTSTAMP_FILTER_ALL)

Configurations

Configuring ptp4l for Multiple Ports

On a system with multiple Ethernet ports (e.g., Intel i210), configure /etc/linuxptp/ptp4l.conf to include all relevant interfaces:

1boundary_clock_jbod 1
2[eno1]
3[eno2]

Note: Add the above required modification at the end of the existing file. Deleting or editing the default settings section of the ptp4l.conf file will result in an error.

The default Ubuntu 18.04 systemd service hardcodes eth0. Override it to use the configuration file instead:

Create a systemd drop-in directory to override the system service file:

$$ sudo mkdir -p /etc/systemd/system/ptp4l.service.d

Create /etc/systemd/system/ptp4l.service.d/override.conf with the following contents:

1[Service]
2ExecStart=
3ExecStart=/usr/sbin/ptp4l -f /etc/linuxptp/ptp4l.conf

Restart the ptp4l service so the change takes effect:

$$ sudo systemctl daemon-reload
$$ sudo systemctl restart ptp4l
$$ sudo systemctl status ptp4l
$* ptp4l.service - Precision Time Protocol (PTP) service
$ Loaded: loaded (/lib/systemd/system/ptp4l.service; enabled; vendor preset: enabled)
$ Drop-In: /etc/systemd/system/ptp4l.service.d
$ └─override.conf
$ Active: active (running) since Wed 2019-03-13 14:38:57 PDT; 3s ago
$ Docs: man:ptp4l
$ Main PID: 25783 (ptp4l)
$ Tasks: 1 (limit: 4915)
$ CGroup: /system.slice/ptp4l.service
$ └─25783 /usr/sbin/ptp4l -f /etc/linuxptp/ptp4l.conf
$
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.756] port 1: INITIALIZING to LISTENING on INITIALIZE
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.756] driver changed our HWTSTAMP options
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.756] tx_type 1 not 1
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.756] rx_filter 1 not 12
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.756] port 2: INITIALIZING to LISTENING on INITIALIZE
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.757] port 0: INITIALIZING to LISTENING on INITIALIZE
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.757] port 1: link up
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.757] port 2: link down
$Mar 13 14:38:57 ubuntu-host ptp4l[25783]: [590188.757] port 2: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)
$Mar 13 14:38:58 ubuntu-host ptp4l[25783]: [590189.360] port 1: new foreign master 001747.fffe.700038-1

The above systemctl status ptp4l console output shows systemd correctly reading the override file created earlier before starting several seconds after the restart command.

The log output shows that a grandmaster clock has been discovered on port 1 (eno1) and port 2 (eno2) is currently disconnected and in the faulty state as expected. In the test network a Trimble Thunderbolt PTP GM100 Grandmaster Clock is attached on eno1.

Logs can be monitored (i.e. followed) like so:

$$ journalctl -f -u ptp4l
$-- Logs begin at Fri 2018-11-30 06:40:50 PST. --
$Mar 13 14:51:37 ubuntu-host ptp4l[25783]: [590948.224] master offset -17 s2 freq -25963 path delay 14183
$Mar 13 14:51:38 ubuntu-host ptp4l[25783]: [590949.224] master offset -13 s2 freq -25964 path delay 14183
$Mar 13 14:51:39 ubuntu-host ptp4l[25783]: [590950.225] master offset 35 s2 freq -25920 path delay 14192
$Mar 13 14:51:40 ubuntu-host ptp4l[25783]: [590951.225] master offset -59 s2 freq -26003 path delay 14201
$Mar 13 14:51:41 ubuntu-host ptp4l[25783]: [590952.225] master offset -24 s2 freq -25986 path delay 14201
$Mar 13 14:51:42 ubuntu-host ptp4l[25783]: [590953.225] master offset -39 s2 freq -26008 path delay 14201
$Mar 13 14:51:43 ubuntu-host ptp4l[25783]: [590954.225] master offset 53 s2 freq -25928 path delay 14201
$Mar 13 14:51:44 ubuntu-host ptp4l[25783]: [590955.226] master offset -85 s2 freq -26050 path delay 14207
$Mar 13 14:51:45 ubuntu-host ptp4l[25783]: [590956.226] master offset 127 s2 freq -25863 path delay 14207
$Mar 13 14:51:46 ubuntu-host ptp4l[25783]: [590957.226] master offset 9 s2 freq -25943 path delay 14208
$Mar 13 14:51:47 ubuntu-host ptp4l[25783]: [590958.226] master offset -23 s2 freq -25973 path delay 14208
$Mar 13 14:51:48 ubuntu-host ptp4l[25783]: [590959.226] master offset -61 s2 freq -26018 path delay 14190
$Mar 13 14:51:49 ubuntu-host ptp4l[25783]: [590960.226] master offset 69 s2 freq -25906 path delay 14190
$Mar 13 14:51:50 ubuntu-host ptp4l[25783]: [590961.226] master offset -73 s2 freq -26027 path delay 14202
$Mar 13 14:51:51 ubuntu-host ptp4l[25783]: [590962.226] master offset 19 s2 freq -25957 path delay 14202
$Mar 13 14:51:52 ubuntu-host ptp4l[25783]: [590963.226] master offset 147 s2 freq -25823 path delay 14202
$...

Configuring ptp4l as a Local Master Clock

The IEEE-1588 Best Master Clock Algorithm (BMCA) selects the grandmaster from all available masters. In most networks only one master should exist. Configure the Ubuntu machine with a lower clockClass value so it wins the BMCA.

Replace the default value with a lower clock class (higher priority) and restart linuxptp. Edit /etc/linuxptp/ptp4l.conf and comment out the default clockClass value and insert a line setting it 128.

1#clockClass 248
2clockClass 128

Restart ptp4l so the configuration change takes effect.

$$ sudo systemctl restart ptp4l

This will configure ptp4l to advertise a master clock on eno2 as a clock that will win the BMCA for an Ouster OS1 sensor.

However, the ptp4l service is only advertising the Ethernet controller’s PTP hardware clock, not the Linux system time as is often expected.

Configuring phc2sys to Synchronize the System Time to the PTP Clock

Run phc2sys to sync the Linux system time to the PTP hardware clock. The following configuration writes CLOCK_REALTIME to the PTP hardware clock on eno2, which connects to Ouster OS1 sensors.

Create a systemd drop-in directory to override the system service file:

$$ sudo mkdir -p /etc/systemd/system/phc2sys.service.d

Create /etc/systemd/system/phc2sys.service.d/override.conf with the following contents:

1[Service]
2ExecStart=
3ExecStart=/usr/sbin/phc2sys -w -s CLOCK_REALTIME -c eno2

Note: If multiple interfaces need to be synchronized from CLOCK_REALTIME then multiple instances of the phc2sys service need to be run as it only accepts a single slave (i.e. -c) argument.

Restart the phc2sys service so the change takes effect:

$$ sudo systemctl daemon-reload
$$ sudo systemctl restart phc2sys
$$ sudo systemctl status phc2sys

Configuring Chrony to Set System Clock Using PTP

To set the system time from an upstream GPS-disciplined PTP grandmaster, use Chrony. Chrony reads from both NTP and PTP and selects the most accurate source. When the PTP grandmaster is functioning correctly, it will be selected over NTP.

The following phc2shm service will synchronize the time from eno1 (where the external grandmaster is attached) to the system clock.

Create /etc/systemd/system/phc2shm.service with the following contents:

1# /etc/systemd/system/phc2shm.service
2[Unit]
3Description=Synchronize PTP hardware clock (PHC) to NTP SHM
4Documentation=man:phc2sys
5After=ntpdate.service
6Requires=ptp4l.service
7After=ptp4l.service
8
9[Service]
10Type=simple
11ExecStart=/usr/sbin/phc2sys -s eno1 -E ntpshm -w
12
13[Install]
14WantedBy=multi-user.target

Start the service and verify it started:

$$ sudo systemctl start phc2shm
$$ sudo systemctl status phc2shm

Add the PTP time source to the chrony configuration. Append to /etc/chrony/chrony.conf:

refclock SHM 0 poll 1 refid ptp

Restart chrony so the updated configuration takes effect:

$$ sudo systemctl restart chrony

After waiting a minute for the clock to synchronize, review the chrony client timing accuracy:

$$ chronyc tracking
$Reference ID : 70747000 (ptp)
$Stratum : 1
$Ref time (UTC) : Thu Mar 14 02:22:58 2019
$System time : 0.000000298 seconds slow of NTP time
$Last offset : -0.000000579 seconds
$RMS offset : 0.001319735 seconds
$Frequency : 0.502 ppm slow
$Residual freq : -0.028 ppm
$Skew : 0.577 ppm
$Root delay : 0.000000001 seconds
$Root dispersion : 0.000003448 seconds
$Update interval : 2.0 seconds
$Leap status : Normal
$
$$ chronyc sources -v
$210 Number of sources = 9
$
$ .-- Source mode '^' = server, '=' = peer, '#' = local clock.
$ / .- Source state '*' = current synced, '+' = combined , '-' = not combined,
$| / '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
$|| .- xxxx [ yyyy ] +/- zzzz
$|| Reachability register (octal) -. | xxxx = adjusted offset,
$|| Log2(Polling interval) --. | | yyyy = measured offset,
$|| \ | | zzzz = estimated error.
$|| | | \
># MS Name/IP address Stratum Poll Reach LastRx Last sample
$
$#* ptp 0 1 377 1 +27ns[ +34ns] +/- 932ns
$^- chilipepper.canonical.com 2 6 377 61 -482us[ -482us] +/- 99ms
$^- pugot.canonical.com 2 6 377 62 -498us[ -498us] +/- 112ms
$^- golem.canonical.com 2 6 337 59 -467us[ -468us] +/- 95ms
$^- alphyn.canonical.com 2 6 377 58 +957us[ +957us] +/- 95ms
$^- 0.pool.ntp.org 3 6 377 62 -10ms[ -10ms] +/- 178ms
$^- 1.pool.ntp.org 2 6 377 128 +429us[ +514us] +/- 42ms
$^- 2.pool.ntp.org 2 6 377 59 +441us[ +441us] +/- 58ms
$^- 3.pool.ntp.org 3 6 377 58 +1364us[+1364us] +/- 99ms

Note that the Reference ID matches the ptp reference ID from the chrony.conf file and that the sources output shows the ptp reference ID as selected (signified by the * state in the second column). Additionally, the NTP time sources show a small relative error to the high accuracy PTP time source.

In this case the PTP grandmaster is properly functioning.

If this error is large, chrony will select the NTP time sources and mark the PTP time source as invalid. This typically signifies that something is mis-configured with the PTP grandmaster upstream of this device or the linuxptp configuration.

Verifying Operation

After setting up a new PTP grandmaster, power-cycle the sensor. This causes it to jump directly to the correct time rather than slowly slewing from an incorrect initial value.

Sensor PTP Sync Verification

Query the sensor’s local PTP state via the http-v1-get-system-time-ptp request.

JSON response fields to check:

  • parent_data_set.grandmaster_identity should list the identity of the local grandmaster
  • port_data_set.port_state should be SLAVE
  • time_status_np.gm_present should be true
  • time_status_np.master_offset which is given in nanoseconds, should be less than 250000. This equates to 250 microseconds.

PTP Example JSON Response

1{
2 "profile": "default",
3 "parent_data_set":
4 {
5 "grandmaster_identity": "001747.fffe.700038",
6 "parent_port_identity": "ac1f6b.fffe.1db84e-2",
7 "parent_stats": 0,
8 "gm_clock_class": 6,
9 "observed_parent_clock_phase_change_rate": 2147483647,
10 "gm_clock_accuracy": 33,
11 "gm_offset_scaled_log_variance": 65535,
12 "grandmaster_priority1": 128,
13 "grandmaster_priority2": 128,
14 "observed_parent_offset_scaled_log_variance": 65535
15 },
16 "current_data_set":
17 {
18 "steps_removed": 1,
19 "offset_from_master": 61355,
20 "mean_path_delay": 117977.0
21 },
22 "port_data_set":
23 {
24 "port_state": "SLAVE",
25 "peer_mean_path_delay": 0,
26 "log_min_delay_req_interval": 0,
27 "port_identity": "bc0fa7.fffe.c48254-1",
28 "log_sync_interval": 0,
29 "log_announce_interval": 1,
30 "delay_mechanism": 1,
31 "log_min_pdelay_req_interval": 0,
32 "announce_receipt_timeout": 3,
33 "version_number": 2
34 },
35 "time_status_np":
36 {
37 "gm_time_base_indicator": 0,
38 "gm_identity": "001747.fffe.700038",
39 "cumulative_scaled_rate_offset": 0,
40 "scaled_last_gm_phase_change": 0,
41 "ingress_time": 0,
42 "master_offset": 61355,
43 "last_gm_phase_change": "0x0000'0000000000000000.0000",
44 "gm_present": true
45 },
46 "time_properties_data_set":
47 {
48 "frequency_traceable": 0,
49 "leap61": 0,
50 "time_traceable": 0,
51 "current_utc_offset": 37,
52 "leap59": 0,
53 "current_utc_offset_valid": 0,
54 "time_source": 160,
55 "ptp_timescale": 1
56 }
57}

LinuxPTP PMC Tool

The sensor responds to PTP management messages. Use the linuxptp pmc utility (man pmc) to query all PTP devices on the local network.

On the Linux host, run:

$$ sudo pmc 'get PARENT_DATA_SET' 'get CURRENT_DATA_SET' 'get PORT_DATA_SET' 'get TIME_STATUS_NP' -i eno2
$sending: GET PARENT_DATA_SET
$sending: GET CURRENT_DATA_SET
$sending: GET PORT_DATA_SET
$sending: GET TIME_STATUS_NP
$ bc0fa7.fffe.c48254-1 seq 0 RESPONSE MANAGEMENT PARENT_DATA_SET
$ parentPortIdentity ac1f6b.fffe.1db84e-2
$ parentStats 0
$ observedParentOffsetScaledLogVariance 0xffff
$ observedParentClockPhaseChangeRate 0x7fffffff
$ grandmasterPriority1 128
$ gm.ClockClass 6
$ gm.ClockAccuracy 0x21
$ gm.OffsetScaledLogVariance 0x4e5d
$ grandmasterPriority2 128
$ grandmasterIdentity 001747.fffe.700038
$ bc0fa7.fffe.c48254-1 seq 1 RESPONSE MANAGEMENT CURRENT_DATA_SET
$ stepsRemoved 2
$ offsetFromMaster 61355.0
$ meanPathDelay 117977.0
$ bc0fa7.fffe.c48254-1 seq 2 RESPONSE MANAGEMENT PORT_DATA_SET
$ portIdentity bc0fa7.fffe.c48254-1
$ portState SLAVE
$ logMinDelayReqInterval 0
$ peerMeanPathDelay 0
$ logAnnounceInterval 1
$ announceReceiptTimeout 3
$ logSyncInterval 0
$ delayMechanism 1
$ logMinPdelayReqInterval 0
$ versionNumber 2
$ bc0fa7.fffe.c48254-1 seq 3 RESPONSE MANAGEMENT TIME_STATUS_NP
$ master_offset 61355
$ ingress_time 0
$ cumulativeScaledRateOffset +0.000000000
$ scaledLastGmPhaseChange 0
$ gmTimeBaseIndicator 0
$ lastGmPhaseChange 0x0000'0000000000000000.0000
> gmPresent true
> gmIdentity 001747.fffe.700038

Tested Grandmaster Clocks

  • Trimble Thunderbolt PTP GM100 Grandmaster Clock
  • Firmware version: 20161111-0.1.4.0, November 11 2016 15:58:25
  • PTP configuration:
> get ptp eth0
Enabled : Yes
Clock ID : 001747.fffe.700038-1
Profile : 1588
Domain number : 0
Transport protocol : IPV4
IP Mode : Multicast
Delay Mechanism : E2E
Sync Mode : Two-Step
Clock Class : 6
Priority 1 : 128
Priority 2 : 128
Multicast TTL : 0
Sync interval : 0
Del Req interval : 0
Ann. interval : 1
Ann. receipt timeout : 3
  • Ubuntu 18.04 + Linux PTP as a master clock
  • Intel i210 Ethernet interface
  • PCI hardware identifiers: 8086:1533 (rev 03)
  • Ubuntu 18.04 kernel package: linux-image-4.18.0-16-generic
  • Ubuntu 18.04 linuxptp package: linuxptp-1.8-1

Analyzing Linux Networking Issues

Note: Follow this section only when experiencing intermittent packet drops or reordering. First verify udp_dest settings—this section does not apply if you are receiving zero data. For zero-data issues, contact our Field Application Team.

This section covers tools and procedures for debugging networking issues in systems with a PC/Workstation, L2 switch, and one or more Ouster sensors. Examples use Linux, but the concepts apply equally to Windows.

Debugging the Workstation Data Path

Diagnose packet loss by examining per-layer statistics in the network stack. Always start at the lowest layer—L1 issues cascade upward and can appear as higher-layer problems.

ethtool

ethtool queries NIC statistics and configuration. While Linux offers more generic kernel filesystem interfaces, ethtool is the most complete tool for NIC debug. Note that output and options are vendor-specific and vary by NIC.

Line Interface Statistics

Start link-layer debugging with ethtool -S <ethX>, where ethX is the NIC identifier from ifconfig. If the system has multiple NICs and you are unsure which is receiving sensor traffic, run traffic and observe which NIC shows incrementing stats.

Note: The output of ethtool -S <ethX> is 100% NIC vendor specific and will be quite different depending on NIC vendor used in your system.

Example output of ethtool -S:

NIC statistics:
rx_packets: 0
tx_packets: 0
rx_bytes: 0
tx_bytes: 0
rx_broadcast: 0
tx_broadcast: 0
rx_multicast: 0
tx_multicast: 0
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 0
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 52
tx_restart_queue: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 0
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_csum_offload_good: 0
rx_csum_offload_errors: 0
rx_header_split: 0
alloc_rx_buff_failed: 0
tx_smbus: 0
rx_smbus: 0
dropped_smbus: 0
rx_dma_failed: 0
tx_dma_failed: 0
rx_hwtstamp_cleared: 0
uncorr_ecc_errors: 0
corr_ecc_errors: 0
tx_hwtstamp_timeouts: 0
tx_hwtstamp_skipped: 0

MAC Errors

Focus on rx (receive) statistics. Any stat labeled rx.*error may indicate a problem. These MAC-level errors typically point to L1 issues: loose connectors, faulty transceivers, or out-of-spec cables. They can also indicate a link-partner MAC problem.

Internal System Errors

Stats like rx_dma_failed and rx_no_buffer_count lack an “error” suffix but indicate real failures in the NIC driver handoff.

Solving MAC Errors

MAC errors most likely indicate a cabling issue. Replace the cable first. If errors persist, test against a different node using iPerf/iPerf3 (see below). As a final step, swap out the sensor.

Solving Internal System Errors

These errors are often counterintuitive—the MAC receives all traffic, yet frames are still dropped. The root cause is typically processor overload at peak rate. Even if average throughput is modest, all NIC traffic arrives at line rate. On a 10G NIC, this means bursts of back-to-back frames that the CPU cannot process in time.

Just how many frames arrive depends on the behavior of the sensors. Ouster sensor attempts to transmit the entire LIDAR frame all at once. Assuming a 40K (on the wire) LiDAR frame and 10 sensors, the worst case load will be 40K x 10 = 400K at 10G (since the peak transmit rate of each sensor is 1G x 10 = 10G.) 400K is a lot of 10G data to process all at once, and without hardware buffering things will certainly fail.

The NIC maintains a hardware ring-buffer or on advanced hardware, potentially multiple ring-buffers. The entries in the ring-buffer are pointers into kernel packet-buffer structures. This mechanism enables the NIC to efficiently deliver packets to the kernel at line rate. For our specific use-case the default size of this ring-buffer may be too small.

To update this value user can use ethtool:

  • ethtool -g <ethX> — display current ring-buffer settings and device limits
  • ethtool -G <ethX> rx <value> — update the rx ring-buffer size

Example: Using a laptop/system, ring-buffer has 256 entries by default:

$$ ethtool -g enp0s31f6
$Ring parameters for enp0s31f6:
$Pre-set maximums:
$RX: 4096
$RX Mini: 0
$RX Jumbo: 0
$TX: 4096
$Current hardware settings:
$RX: 256
$RX Mini: 0
$RX Jumbo: 0
$TX: 256

To find out how much buffer is sufficient we can apply the burst-tolerance equation:

fill_rate = NIC_line_speed - max_measured_throughput fill_time = rx_buffer_size * 1518 * 8 / fill_rate MBS = fill_time * NIC_line_speed

Note: It is not always easy to obtain max_measured_throughput, and in a busy workstation it can be subject to variable delay.

As a rule-of-thumb we need to at least accommodate one max-burst (one LiDAR packet) from the sensor. Assuming a 40KB LiDAR packet that’s 40KB/1518=27 frames. So 256 should be more than adequate.

However, even with the default buffer of 256, user can observe packet loss due to DMA errors. This is because the work-station is not a real-time system and the delay can be quite variable. Linux uses a technique called interrupt coalescence that determines how often it will service the driver, when it gets very busy.

Interrupt coalescence is controlled by the kernel filesystem key:

/proc/sys/net/core/netdev_budget_usecs and by default it's 8000us!

If the problem is not resolved by increasing the buffer size, it’s possible to reduce netdev_budget_usecs in order to favor moving data over other activities that the system could be doing. It’s also possible to increase the maximum number of frames the OS is willing to process when the line interface does get serviced which is controlled by:

/proc/sys/net/core/netdev_budget

Note: On some systems the user need to make the rx-ring-buffer quite large or disable interrupt coalescence all together.

The NIC also delays hardware interrupts. Use ethtool -c to view the ACQ107’s default settings:

$$ ethtool -c enp4s0
$Coalesce parameters for enp4s0:
$Adaptive RX: off
$TX: off
$stats-block-usecs: 0
$sample-interval: 0
$pkt-rate-low: 0
$pkt-rate-high: 0
$rx-usecs: 112
$rx-frames: 0
$rx-usecs-irq: 0
$rx-frames-irq: 0
$tx-usecs: 510
$tx-frames: 0
$tx-usecs-irq: 0
$tx-frames-irq: 0
$rx-usecs-low: 0
$rx-frames-low: 0
$tx-usecs-low: 0
$tx-frames-low: 0
$rx-usecs-high: 0
$rx-frames-high: 0
$tx-usecs-high: 0
$tx-frames-high: 0

Another useful parameter is the /proc/sys/net/core/netdev_max_backlog. The backlog queue, is a FIFO on the other side of the NIC ring-buffer. Increasing the backlog buffer is one more way to add capacity earlier in the data-path. It’s difficult to determine when to increase netdev_max_backlog vs increasing the rx ring-buffer. Certainly the ring-buffer is the only place where we can add capacity that can absorb traffic bursts at line rate.

Troubleshooting Advanced NICs

Advanced hardware interfaces have multiple ring-buffers that are typically mapped to different CPU cores (a technique known as RSS.) Each NIC has its own proprietary scheme for mapping input traffic flows to ring-buffers, and sometimes a NIC will incorrectly split a traffic flow into multiple FIFOs. If you see this behavior it means that the NIC itself will cause frames to be reordered in a way that will horribly disrupt the IP stack above it. The ACQ107 is one such NIC. The problem can be identified by looking at ethtool -S <ethX>. The NIC will list stats for each FIFO, and by sending a single large traffic flow we can see that device errantly split the flow into all of the different FIFOs. Below you can see that this NIC has stats labeled Queue[0] … Queue[7].

Example:

$$ ethtool -S enp4s0
$NIC statistics:
$InPackets: 350287807
$ InUCast: 350048688
$ InMCast: 231724
$ InBCast: 7395
$ InErrors: 0
$ OutPackets: 363162007
$ OutUCast: 363160208
$ OutMCast: 1306
$ OutBCast: 493
$ InUCastOctets: 525223100117
$ OutUCastOctets: 545214487081
$ InMCastOctets: 16440320
$ OutMCastOctets: 206101
$ InBCastOctets: 1316312
$ OutBCastOctets: 58497
$ InOctets: 525240856749
$ OutOctets: 545214751679
$ InPacketsDma: 23207849
$ OutPacketsDma: 22064728
$ InOctetsDma: 34568308793
$ OutOctetsDma: 33164524696
$ InDroppedDma: 2002075
$ Queue[0] InPackets: 23087183
$ Queue[0] InJumboPackets: 0
$ Queue[0] InLroPackets: 0
$ Queue[0] InErrors: 0
$ Queue[0] AllocFails: 0
$ Queue[0] SkbAllocFails: 0
$ Queue[0] Polls: 7373190
$ Queue[0] OutPackets: 649028
$ Queue[0] Restarts: 0
$ Queue[1] InPackets: 80
$ Queue[1] InJumboPackets: 0
$ Queue[1] InLroPackets: 0
$ Queue[1] InErrors: 0
$ Queue[1] AllocFails: 0
$ Queue[1] SkbAllocFails: 0
$ Queue[1] Polls: 14672
$ Queue[1] OutPackets: 1651541
$ Queue[1] Restarts: 0
$ Queue[2] InPackets: 103
$ Queue[2] InJumboPackets: 0
$ Queue[2] InLroPackets: 0
$ Queue[2] InErrors: 0
$ Queue[2] AllocFails: 0
$ Queue[2] SkbAllocFails: 0
$ Queue[2] Polls: 215484
$ Queue[2] OutPackets: 3815296
$ Queue[2] Restarts: 0
$ Queue[3] InPackets: 269
$ Queue[3] InJumboPackets: 0
$ Queue[3] InLroPackets: 0
$ Queue[3] InErrors: 0
$ Queue[3] AllocFails: 0
$ Queue[3] SkbAllocFails: 0
$ Queue[3] Polls: 14469
$ Queue[3] OutPackets: 1580307
$ Queue[3] Restarts: 0
$ Queue[4] InPackets: 119681
$ Queue[4] InJumboPackets: 0
$ Queue[4] InLroPackets: 0
$ Queue[4] InErrors: 0
$ Queue[4] AllocFails: 0
$ Queue[4] SkbAllocFails: 0
$ Queue[4] Polls: 157920
$ Queue[4] OutPackets: 3670607
$ Queue[4] Restarts: 0
$ Queue[5] InPackets: 83
$ Queue[5] InJumboPackets: 0
$ Queue[5] InLroPackets: 0
$ Queue[5] InErrors: 0
$ Queue[5] AllocFails: 0
$ Queue[5] SkbAllocFails: 0
$ Queue[5] Polls: 9006
$ Queue[5] OutPackets: 931971
$ Queue[5] Restarts: 0
$ Queue[6] InPackets: 407
$ Queue[6] InJumboPackets: 0
$ Queue[6] InLroPackets: 0
$ Queue[6] InErrors: 0
$ Queue[6] AllocFails: 0
$ Queue[6] SkbAllocFails: 0
$ Queue[6] Polls: 15387
$ Queue[6] OutPackets: 1636793
$ Queue[6] Restarts: 0
$ Queue[7] InPackets: 43
$ Queue[7] InJumboPackets: 0
$ Queue[7] InLroPackets: 0
$ Queue[7] InErrors: 0
$ Queue[7] AllocFails: 0
$ Queue[7] SkbAllocFails: 0
$ Queue[7] Polls: 11584
$ Queue[7] OutPackets: 343508
$ Queue[7] Restarts: 0
$ PTP Queue[16] InPackets: 0
$ PTP Queue[16] InJumboPackets: 0
$ PTP Queue[16] InLroPackets: 0
$ PTP Queue[16] InErrors: 0
$ PTP Queue[16] AllocFails: 0
$ PTP Queue[16] SkbAllocFails: 0
$ PTP Queue[16] Polls: 0
$ PTP Queue[16] OutPackets: 0
$ PTP Queue[16] Restarts: 0
$ PTP Queue[31] InPackets: 0
$ PTP Queue[31] InJumboPackets: 0
$ PTP Queue[31] InLroPackets: 0
$ PTP Queue[31] InErrors: 0
$ PTP Queue[31] AllocFails: 0
$ PTP Queue[31] SkbAllocFails: 0
$ PTP Queue[31] Polls: 0
$ MACSec InCtlPackets: 0
$ MACSec InTaggedMissPackets: 0
$ MACSec InUntaggedMissPackets: 23252064
$ MACSec InNotagPackets: 23252064
$ MACSec InUntaggedPackets: 0
$ MACSec InBadTagPackets: 0
$ MACSec InNoSciPackets: 0
$ MACSec InUnknownSciPackets: 0
$ MACSec InCtrlPortPassPackets: 0
$ MACSec InUnctrlPortPassPackets: 23252064
$ MACSec InCtrlPortFailPackets: 0
$ MACSec InUnctrlPortFailPackets: 0
$ MACSec InTooLongPackets: 0
$ MACSec InIgpocCtlPackets: 0
$ MACSec InEccErrorPackets: 0
$ MACSec InUnctrlHitDropRedir: 0
$ MACSec OutCtlPackets: 1
$ MACSec OutUnknownSaPackets: 22064727
$ MACSec OutUntaggedPackets: 0
$ MACSec OutTooLong: 0
$ MACSec OutEccErrorPackets: 0
$MACSec OutUnctrlHitDropRedir: 0

The vendor provided a workaround in their README .

Note: RSS for UDP Currently, NIC does not support RSS for fragmented IP packets, which leads to an incorrect handling of RSS for fragmented UDP traffic. To disable RSS for UDP one can use the following RX Flow L3/L4 rule: ethtool -N eth0 flow-type udp4 action 0 loc 32

When Stats Fail

Sometimes a NIC will drop frames without any error stats incrementing. When this happens, the issue can be detected by inserting a managed L2 switch in between the sensor and the workstation. The managed switch will report receive and transmit stats, which can be correlated against the rx stats of the NIC to determine that the NIC has dropped frames without incrementing any stat.

IP Statistics

After the link layer the next layer up is IP. IP errors can be identified with the netstat tool:

netstat -s

This tool outputs a lot of information; focus on the IP section.

$$ netstat -s
$Ip:
$ Forwarding: 2
$ 349183315 total packets received
$ 10794 with invalid addresses
$ 0 forwarded
$ 0 incoming packets discarded
$ 9552227 incoming packets delivered
$ 9078834 requests sent out
$ 20 outgoing packets dropped
$ 133594 dropped because of missing route
$ 3041 fragments dropped after timeout
$ 346297473 reassemblies required
$ 6748472 packets reassembled ok
$ 69242181 packet reassemblies failed
$ 8504728 fragments received ok
$ 355166721 fragments created

In this report you can see that there are a few different error categories:

10794 with invalid addresses
133594 dropped because of missing route
3041 fragments dropped after timeout
69242181 packet reassemblies failed

Let’s look at each class of error and consider it’s implications:

  • Packets received with invalid address means that they were sent to our MAC, but with an incorrect source IP.
  • Packets dropped because of missing route indicates that the packet was sent to the correct IP address but no client program was listening on the destination port.
  • Fragments dropped after timeout means that we received some data but subsequent data didn’t arrive in time to be processed.
  • Fragments reassemblies failed means that some data was missing due to an Ethernet frame being aborted by the stack or being lost in transit and the IP layer was not able to reassemble a complete datagram.

Debugging a Layer 3 Issue

The best way to debug issues in the IP layer is to find them in the link layer, because generally speaking layer-2 issues are caused by layer-3 bugs, but this is not always the case.

For instance, packets received with invalid address are probably indicative of stale ARP table entries or some other external network bug or temporal state that will most likely clear up on its own. This sort of problem is probably not worth debugging unless its persistent. Packets dropped because of missing route is more indicative of an issue at the application layer (the client or server simply wasn’t listening when the packets arrived).

If a problem is detectable by L3 and not by L2, then its most likely a problem in the NIC itself, and if the NIC isn’t providing a FIFO or DMA stat that explains it. One possibility is packet reordering by the NIC. This can be detected by modifying

/proc/sys/net/ipv4/ipfrag_max_dist

This kernel attribute determines the systems tolerance to receiving out-of-order IPv4 frames. Nominally L2 networks do not reorder packets, so you should be able to configure a value of 1 and not observe a change in behavior. However, if setting a low threshold exacerbates the issue, or setting a high value makes the problem less severe then the NIC is most likely to blame.

Useful network debugging tools

iPerf

iPerf validates whether a system can sustain a target throughput. Configure it to mimic the sensor’s expected data rate. See the iPerf documentation for details.

Use iPerf to rule out sensor-side failures and reproduce errors under high network load. Run it from two machines:

  • Server (receiving data)
  • Client (sending data)

Both sides report packet counts and loss percentage.

Example: Test sender sending 300 Mbps of UDP packets with 20 KB blocks to receiver:

Receiver (server side):

$receiver$ iperf3 --server --port 5300
$------------------------------------------------------------
$Server listening on 5300
$------------------------------------------------------------
$Accepted connection from 192.168.88.251, port 44824
$[ 5] local 192.168.88.248 port 5300 connected to 192.168.88.251 port 40426
$[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
$[ 5] 0.00-1.00 sec 35.7 MBytes 300 Mbits/sec 0.016 ms 0/1830 (0%)
$[ 5] 1.00-2.00 sec 35.8 MBytes 300 Mbits/sec 0.018 ms 0/1831 (0%)
$[ 5] 2.00-3.00 sec 35.8 MBytes 300 Mbits/sec 0.015 ms 0/1831 (0%)
$[ 5] 3.00-4.00 sec 35.8 MBytes 300 Mbits/sec 0.015 ms 0/1831 (0%)
$[ 5] 4.00-5.00 sec 35.8 MBytes 300 Mbits/sec 0.015 ms 0/1831 (0%)
$[ 5] 5.00-6.00 sec 35.8 MBytes 300 Mbits/sec 0.017 ms 0/1831 (0%)
$[ 5] 6.00-7.00 sec 35.8 MBytes 300 Mbits/sec 0.017 ms 0/1831 (0%)
$[ 5] 7.00-8.00 sec 35.8 MBytes 300 Mbits/sec 0.020 ms 0/1831 (0%)
$[ 5] 8.00-9.00 sec 35.8 MBytes 300 Mbits/sec 0.023 ms 0/1831 (0%)
$[ 5] 9.00-10.00 sec 35.8 MBytes 300 Mbits/sec 0.016 ms 0/1831 (0%)
$- - - - - - - - - - - - - - - - - - - - - - - - -
$[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
$[ 5] 0.00-10.00 sec 358 MBytes 300 Mbits/sec 0.016 ms 0/18309 (0%) receiver

Receiver arguments:

  • --server — Required. Indicates this machine receives data.
  • --port 5300 — Port to listen on. Useful for testing with multiple sources.

Sender (client side):

$sender$ iperf3 --client 192.168.88.248 --port 5300 --udp --bitrate 300M --length 20K
$warning: UDP block size 20480 exceeds TCP MSS 1448, may result in fragmentation / drops
$Connecting to host 192.168.88.248, port 5300
$[ 5] local 192.168.88.251 port 40426 connected to 192.168.88.248 port 5300
$[ ID] Interval Transfer Bitrate Total Datagrams
$[ 5] 0.00-1.00 sec 35.7 MBytes 300 Mbits/sec 1830
$[ 5] 1.00-2.00 sec 35.8 MBytes 300 Mbits/sec 1831
$[ 5] 2.00-3.00 sec 35.8 MBytes 300 Mbits/sec 1831
$[ 5] 3.00-4.00 sec 35.8 MBytes 300 Mbits/sec 1831
$[ 5] 4.00-5.00 sec 35.8 MBytes 300 Mbits/sec 1831
$[ 5] 5.00-6.00 sec 35.8 MBytes 300 Mbits/sec 1831
$[ 5] 6.00-7.00 sec 35.8 MBytes 300 Mbits/sec 1831
$[ 5] 7.00-8.00 sec 35.8 MBytes 300 Mbits/sec 1831
$[ 5] 8.00-9.00 sec 35.8 MBytes 300 Mbits/sec 1831
$[ 5] 9.00-10.00 sec 35.8 MBytes 300 Mbits/sec 1831
$- - - - - - - - - - - - - - - - - - - - - - - - -
$[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
$[ 5] 0.00-10.00 sec 358 MBytes 300 Mbits/sec 0.000 ms 0/18309 (0%) sender
$[ 5] 0.00-10.00 sec 358 MBytes 300 Mbits/sec 0.016 ms 0/18309 (0%) receiver
$
$iperf Done.

Sender arguments:

  • --client 192.168.88.248 — IP address of the receiver.
  • --port 5300 — Port to send to. Must match --port on the receiver.
  • --udp — Sends UDP traffic. Without this flag, TCP is used.
  • --bitrate 300M — Send rate in bits/sec. Supports K, M, G suffixes.
  • --length 20K — UDP datagram size (20 KB).