Alerting - Log files, SNMP, syslog, email, Windows event log

TimeKeeper provides several methods of delivering notifications of significant events. These notifications are called ‘alerts’. These are always logged in the main TimeKeeper log file (/var/log/timekeeper on UNIX systems and timekeeper.log on Windows). On Linux they can optionally be delivered via email, syslog, SNMP, or a combination of all three. On Windows TimeKeeper sends event information to the Windows Event log and optionally via SNMP. Below is an overview of the different types of alerts and each type of delivery mechanism.

Alert messages

The following sections contain the specific messages that TimeKeeper will emit via log files, event logs, SNMP, email, or syslog. Variable components of each message are replaced with the string VALUE below - depending on the specific value found, the message will vary here. Each alert is followed by a brief explanation of the cause, and are grouped by function.

Below these message are grouped by their SNMP trap type only to group them into functional categories. All of these messages are sent through all alerting mechanisms that TimeKeeper supports and not just SNMP traps.

clientQualityTrap

  1. Client sync quality error on PTP server VALUE, client VALUE, source VALUE: absolute value of offset VALUE > VALUE
  2. Client sync quality error on NTP server 0, client VALUE: absolute value of offset VALUE > VALUE
  3. Client VALUE Source VALUE: inactive for VALUE seconds > VALUE seconds
  4. Client VALUE Source VALUE: is now active again for VALUE seconds

TimeKeeper grandmasters collect client sync accuracy information from clients - and will emit the above messages when one of the clients exceeds the server-specified threshold. (See the per-PTP-server configuration variable SYNCERRORTHRESHOLD, the global NTP configuration variable NTPSYNCERRORTHRESHOLD and the INACTIVE_CLIENT_THRESHOLD global configuration variable.)

Messages 1 and 2 can be throttled by setting the SYNC_ERROR_THRESHOLD_THROTTLE variable. These messages can be limited to the primary source only by setting the SYNC_ERROR_THRESHOLD_ALERT_ONLY_PRIMARY variable.

Message 3 will only be sent if TimeKeeper has detected that a client has become inactive for the period of time specified by the INACTIVE_CLIENT_THRESHOLD configuration option.

Message 4 is a clearing trap for Message 3 - if TimeKeeper previously detected that a client was inactive and has since become active again, this message provides notification of the change.

When configured to send SNMP traps, this event will be delivered using the clientQualityTrap.

licenseStateTrap

  1. License problems detected at startup. Please contact support@fsmlabs.com.
  2. Unable to get license information for TimeKeeper VALUE, continuing for VALUE seconds. (status VALUE)
  3. TimeKeeper VALUE: Had license status issues, but repeated checks now indicate correct state. Continuing.
  4. Warning: TimeKeeper VALUE license pending expiry in VALUE days.
  5. TimeKeeper VALUE license expired, in grace period, continuing for VALUE days.
  6. Unable to get valid license information, due to expiry or repeated failed checks. TimeKeeper shutting down.

The above messages are sent when TimeKeeper needs to alert the user to licensing issues.

Message 1 is only sent on startup if TimeKeeper cannot get an initial license to use. There is no clearing trap for message 1 as TimeKeeper is not starting up.

Message 2 indicates TimeKeeper is unable to contact a license server (if in use) to get license data, and will continue for the specified time. Message 3 is a clearing trap for message 2. This will only ever occur if TimeKeeper gets its license from a remote site, was unable to reach that site for a period of time, and then regains connectivity.

Messages 4 and 5 indicate that either a license is going to expire, or has expired and is in a grace period, so a new license can be put in place before it becomes an issue. There are no clearing traps for these messages. Once a new license file is in place and active they won’t be triggered.

Message 6 is sent when license checks have failed continuously and TimeKeeper is unable to continue. There is no clearing trap for this message.

When configured to send SNMP traps, this event will be delivered using the licenseStateTrap.

sourceStateTrap

  1. Source VALUE: source appears to be valid again - re-enabling
  2. Source VALUE declared invalid/insecure
  3. Source VALUE: TimeKeeper GPS/Oscillator detected critical jamming level
  4. Source VALUE: TimeKeeper GPS device reports overheating, Temp: VALUEC/VALUEF
  5. Source VALUE: TimeKeeper GPS/Oscillator detected VALUE% jamming/noise
  6. Source VALUE: u-blox GNSS receiver detected critical jamming level
  7. Source VALUE: u-blox GNSS receiver detected VALUE% jamming/noise
  8. Source VALUE: inactive for VALUE seconds > VALUE seconds
  9. Source VALUE: PTP reported GM UTC/TAI offset change that appears incorrect VALUE -> VALUE
  10. Source VALUE: post startup offset error > VALUE, setting to reference time
  11. Source VALUE: TimeKeeper GNSS/Oscillator detecting sustained jamming, this may result in degraded holdover accuracy

If TimeKeeper is crosschecking sources with the SOURCECHECK feature, some sources may be found to be invalid and rejected for a time. If a source has been rejected but is now found to be valid again, message 1 above will be sent. This is a clearing trap for the remainder of the list.

Similarly, TimeKeeper will send trap 2 above when a source is rejected.

Messages 3-5 indicate there is an issue with the GPS device provided in TimeKeeper Grandmasters. These messages do not have clearing equivalents, and may indicate ‘GNSS’ rather than ‘GPS’ specifically.

Messages 6 and 7 indicate that there is an issue with a u-blox GNSS receiver.

Message 8 indicates that the source has stopped responding for a time that exceeds the outage time limit. The current outage limit is 180 seconds. This message is enabled by setting the ENABLE_INACTIVE_SOURCE_ALERT variable.

Message 9 can occur when the GM reports a change in the UTC/TAI offset that appears to be wrong or not possible.

Message 10 is provided when ALLOW_SET_TIME_AFTER_STARTUP is set and TimeKeeper is forced to set the time to correct for a large jump in time, like when recovering time from a suspended VM. This provides context when matching events to large offsets reported in TimeKeeper Compliance audits.

Message 11 indicates that TimeKeeper has determined that jamming to be an ongoing issue. Because of the this, oscillator training may not be complete. If the GNSS signal is lost in this state and the oscillator is used for holdover the oscillator accuracy may not be ideal. If jamming is expected to continue it’s recommended that the HOLDOVER_LIMIT value for the source be reduced. This way if GNSS signal is lost comparisons against backup sources will begin sooner. When applicable, this alert is emitted roughly once per day.

When configured to send SNMP traps, this event will be delivered using the sourceStateTrap.

changedSourceTrap

  1. Time source change VALUE -> VALUE (source VALUE -> source VALUE)
  2. Unable to find any valid time source.
  3. Previously unable to find any valid time source, now using source VALUE.
  4. Source VALUE: Holdover limit expired (VALUE seconds). Starting to cross-check time

Message 1 above will be sent whenever TimeKeeper changes source for any reason - such as when one source stops providing timing data for more than 3 minutes. This will also occur in situations where the source TimeKeeper is tracking changes where it is getting time from, such as when an NTP server switches to another upstream source.

Message 2 will only be sent if TimeKeeper has failed over to the last configured source and even that has stopped responding. At this point TimeKeeper is unable to track any sources and is driving the clock based on its best known clock rate.

Message 3 is a clearing trap for the second - if TimeKeeper previously lost all sources but one was found to be usable again, this message provides notification of the change.

TimeKeeper sends message 4 when the GNSS receiver has been in holdover for the amount of time specified by the HOLDOVER_LIMIT option. TimeKeeper may or may not subsequently change sources.

When configured to send SNMP traps, this event will be delivered using the changedSourceTrap.

sourceQualityTrap

  1. Sync quality error on source VALUE, absolute value of offset VALUE > VALUE
  2. Sync quality restored on source VALUE, absolute value of offset VALUE < VALUE
  3. Leap second will be inserted today

When a client has the SYNCERRORTHRESHOLD configuration option defined on a source, it will emit message 1 above whenever the sync offset exceeds that threshold. When the sync comes back into the SYNCERRORTHRESHOLD range, the message 2 will be sent as a means of clearing state.

When configured to send SNMP traps, this event will be delivered using the sourceQualityTrap.

For more information on leap second handling, see the “Leap seconds” section.

grandMasterFaultTrap

  1. Grandmaster unable to collect system chassis version information.*
  2. Grandmaster unable to collect system chassis information from pipe.*
  3. Grandmaster power supply 1 failure
  4. Grandmaster power supply 2 failure
  5. RAID fault detected.*
  6. Unable to confirm RAID status
  7. Inlet temperature failure
  8. Health check failed. *
  9. S.M.A.R.T. attribute * has reached its pre-fail threshold. Possible imminent * drive failure.

The *s above indicates there may be further information provided in the data, such as which disk is failing in the case of the RAID events.

These messages are specific to TimeKeeper Grandmasters, and indicate issues with the appliance. Contact support@fsmlabs.com for assistance. These messages do not have clearing equivalents.

When configured to send SNMP traps, this event will be delivered using the grandMasterFaultTrap.

startupFaultTrap

  1. Address VALUE specified for WEB_MANAGEMENT_IP but was not matched to a device. Ignoring parameter.
  2. Cannot open socket: VALUE
  3. Source VALUE: Cannot open socket: VALUE
  4. Source VALUE: Unable to prepare server socket descriptors for PTP
  5. Unable to prepare server socket descriptors for PTP
  6. Invalid MAJORTIME reference specified for source
  7. Invalid MAJORTIME SOURCE value specified on source VALUE
  8. No time sources found. Make sure at least one time source is declared in /etc/timekeeper.conf
  9. Invalid CPU value specified: VALUE
  10. Error, LOGDIR specified but invalid: VALUE
  11. Initial license check failed
  12. Error processing PTP server argument VALUE, server VALUE
  13. Error processing source argument VALUE, source VALUE
  14. Unable to create Compliance directories
  15. Unable to create Compliance directory VALUE
  16. Unable to chown Compliance directory VALUE
  17. Unable to look up nobody user
  18. SNMPTRAPPASSPHRASE decode failed, ignoring parameter
  19. SNMPTRAPPASSPHRASE was less than 8 characters long, ignoring parameter
  20. WARNING! OS reports clock resolution (VALUE seconds) below what is required for accurate operation.
  21. TYPE VALUE: Cannot open socket: IPv6 not supported.
  22. Install is invalid. Please contact support@fsmlabs.com.
  23. Source VALUE: Invalid interface specified: ‘VALUE’
  24. HTTPS_KEY_PASSPHRASE decode failed, ignoring parameter
  25. External environment failed consistency checks, exiting. VALUE
  26. Warning, invalid TLS_VERSION VALUE specified (must be VALUE). Ignoring parameter.
  27. Cannot specify both IPV6 and PTP_LAYER2.

On startup, if the WEB_MANAGEMENT_IP parameter is used to limit access to the web tools, TimeKeeper makes sure that matches a configured interface. If it does not, message 1 will be sent out and TimeKeeper will allow the web interface to be used on all interfaces so the issue can be resolved.

If any source or PTP server has IPv6 enabled and IPv6 is not supported, message 21 will be sent out. ‘TYPE’ indicates the configuration type of ‘Source’ or PTP ‘Server’.

The rest of the messages are a subset of the number of possible traps, sent on startup if TimeKeeper is unable to prepare internal resources as needed based on the current configuration, or if it’s unable to start for other reasons, like an invalid configuration. TimeKeeper will be available for reconfiguration via the web interface (if enabled). TimeKeeper Grandmaster appliances always run the web interface so that reconfiguration is always possible.

When configured to send SNMP traps, this event will be delivered using the startupFaultTrap. Regardless of the specific message, a startupFaultTrap indicates TimeKeeper is not operating normally and requires investigation - generally this is due to a misconfiguration.

systemAlertTrap

  1. Realtime clock changed by something other than TimeKeeper: expected ticks VALUE but was VALUE (adjtime VALUE)
  2. Realtime clock changed by something other than TimeKeeper: expected freq VALUE but was VALUE (adjtime VALUE)
  3. Realtime clock changed by something other than TimeKeeper: expected constant VALUE but was VALUE
  4. Realtime clock changed by something other than TimeKeeper: expected offset 0 but was VALUE
  5. Clock changed by something other than TimeKeeper: expected freq VALUE but was VALUE
  6. Clock changed by something other than TimeKeeper: expected tick VALUE but was VALUE
  7. Clock changed by something other than TimeKeeper: expected offset 0 but was VALUE
  8. Clock changed by something other than TimeKeeper: expected adjustment VALUE but was VALUE
  9. Active slave for VALUE changed from VALUE to VALUE

Messages 1-8 are sent when TimeKeeper suspects that something else is driving the system clock (see the “Clock adjustment and steering” section for details).

TimeKeeper sends message 9 when a different slave in a bond becomes active, suggesting that the previously active slave failed.

When configured to send SNMP traps, this event will be delivered using the systemAlertTrap.

Alerting Mechanisms

This table shows which alerting mechanisms are available on which platforms:

Grandmaster Linux Solaris Windows
TimeKeeper log Y Y Y Y
SNMP Y Y Y Y
email Y Y Y
syslog Y Y Y
Windows event log Y

SNMP

TimeKeeper can generate SNMP traps if configured in timekeeper.conf on Linux and Windows. TimeKeeper uses the program snmptrap from the package net-utils to send traps. This program is not included with TimeKeeper on Linux but is commonly included with Linux distributions. On Windows a version of snmptrap is included.

To configure TimeKeeper to generate SNMP traps and deliver them to the host host1 add the following line to the timekeeper.conf file:

SNMPTRAPHOST=host1

host1 may be a hostname or an IPv4 dotted notation address. TimeKeeper supports multiple SNMP trap destinations and they can be listed as comma separate values:

SNMPTRAPHOST=host1,host2,host3

If only SNMPTRAPHOST is specified, TimeKeeper will use SNMPv2.

TimeKeeper also supports SNMPv3 User-Based Security Model and currently uses MD5 for authentication with DES for privacy. To configure TimeKeeper to use SNMPv3 traps, with authentication and privacy (authPriv security level), add the following lines to the timekeeper.conf file:

SNMPTRAPUSERNAME=DemoName
SNMPTRAPPASSPHRASE='encryptedpassphrase'

Note SNMPTRAPPASSPHRASE is used for both the authentication and privacy passphrases. Because it is stored in encrypted form, the value must be set using the TimeKeeper GUI, or by using the encodepassphrase utility provided with TimeKeeper.

If the parameter SNMPTRAPEID is set in timekeeper.conf, TimeKeeper will use this parameter to generate the engineID. If the parameter is not set, TimeKeeper will use the MAC address of the default network interface to generate the engineID.

If the SNMPTRAPOID parameter is set in timekeeper.conf, TimeKeeper will emit all traps to that OID. This is to retain legacy TimeKeeper behavior where all traps were by default sent to OID 1.3.6.1.2.1.16.0.1. Unless you need all traps sent to one OID, it is recommended you leave SNMPTRAPOID undefined in your configuration.

SNMP MIB

By default, TimeKeeper delivers traps according to the provided MIB file, which can be found at:

/opt/timekeeper/management/snmp/timekeeper_mib.txt

on Linux, and at:

C:\Program Files\timekeeper\management\timekeeper_mib.txt

on Windows.

TimeKeeper Grandmasters also allow users to walk the SNMP tree on the appliance. To enable this feature, click Enable SNMP queries in the web interface, under Configuration, subtab Service & System Management. The SNMP tree can be walked remotely with standard tools - for example, assuming the Grandmaster is at IP 10.1.0.3, with SNMP community string ‘public’:

snmpwalk -v2c -c public 10.1.0.3 .1.3.6.1.4.1.42733.0

The SNMP values are cached over a short period in order to speed up SNMP queries in all conditions. Do not depend on querying SNMP at high rates to detect timing errors such as when a time source exceeds a named threshold. Instead, rely on TimeKeeper’s ability to send SNMP traps to your management host as the event occurs. This provides immediate notification of errors and allows you to benefit from all of the different scenarios that TimeKeeper can detect.

The community string can be changed via the TimeKeeper web interface on the Grandmaster over HTTPS.

On TimeKeeper Grandmasters, in addition to the TimeKeeper-specific objects, you can query system objects via the following OIDs.

.1.3.6.1.4.1.2021.4 UCD-SNMP-MIB::memory
.1.3.6.1.4.1.2021.10 UCD-SNMP-MIB::laTable
.1.3.6.1.2.1.25.1 HOST-RESOURCES-MIB::hrSystem
.1.3.6.1.2.1.25.2 HOST-RESOURCES-MIB::hrStorage
.1.3.6.1.2.1.2 IF-MIB::interfaces
.1.3.6.1.2.1.31.1.1 IF-MIB::ifXTable

Walking SNMP tree on non-grandmasters

TimeKeeper does allow the SNMP tree to be walked on non-Grandmaster configurations, such as clients, servers, and boundary clocks. However, the specific configuration may vary depending on the distribution in use. A recipe that will work on many installations is to add these two lines to your snmpd.conf:

view systemview included .1.3.6.1.4.1.42733.0
pass .1.3.6.1.4.1.42733.0 /opt/timekeeper/management/snmp/snmphandler

An additional two lines are needed for SNMPv3 support :

createUser DemoName MD5 DemoPassphrase DES
rwuser DemoName priv

Please contact support@fsmlabs.com for more details if needed.

Syslog

Events that are sent over email or SNMP can also be delivered via syslog. TimeKeeper automatically emits syslog messages, so TimeKeeper clients can be configured to send data to a syslog server just like any other application, using whichever syslog daemon is in use on the client.

With TimeKeeper Grandmasters, the ability to configure where syslog data is sent is available in the web GUI. To change the configuration, log into the web interface, navigate to Configuration and then Service & System Management. Under Manage Communication, select Configure Syslog.

From this interface, 3 separate syslog servers can be configured. Once applied, the change is immediate and syslog messages will be delivered to those hosts without having to restart the system or TimeKeeper.

Email

TimeKeeper can generate emails for events if configured in /etc/timekeeper.conf with the EMAILNOTIFICATION parameter. An example of this is:

EMAILNOTIFICATION=test@example.com

or, if multiple addresses should receive notification, use a comma-separated list:

EMAILNOTIFICATION=test@example.com,another@example.com

TimeKeeper uses the mail program to send the notifications. As with the snmptrap application, TimeKeeper does not provide the application but relies on the host configuration and network connectivity to deliver mail.

Emails may be throttled and bundled together using the EMAILNOTIFICATION_THROTTLE option, detailed in the “Global options” section.

Windows event log

As detailed above, on Windows TimeKeeper will automatically send information to the Windows event log (and optionally via SNMP). Any event that would be logged via SNMP, email, or syslog on Linux will be sent to the event log, so if sync error thresholds are configured and the threshold reached, that event will be logged locally at a minimum.

On the host itself, the details of these events can be viewed with the normal Event Viewer tool.

Logging configuration changes and events

On both TimeKeeper Grandmasters and software installations of TimeKeeper, notable updates made through the web interface (and tkctl on the Grandmasters) are logged to syslog on Linux and the event log on Windows. For environments that require change tracking this allows for centralized recording of changes made to all types of TimeKeeper installations.

Changes are reported as system/event log updates of the form:

TimeKeeper set successful: gm.config.timekeeper by user admin

This notes the specific change (TimeKeeper configuration change in this case), whether it was made (successful) or not (attempted), and the user who made the change. The event/system log notes the time and day it was done.

Changes that are logged are ‘set’ operations or change actions as defined by the tkctl tool. A list of possible named changes/actions are listed in the “tkctl options” section. Some options that apply to non-grandmaster installations, like updating the TimeKeeper configuration file, are also logged. For non-grandmaster installations the name will be prefixed with ‘tk.’ rather than ‘gm.’ as in the example above.

Events that are significant will be sent in the same way as changes, with the name ‘gm.event.eventname’ or ‘tk.event.eventname’ depending on whether it’s on a grandmaster or a software client. For example, logins are recorded for success or failure like this:

TimeKeeper event successful: gm.event.login by user admin

on a TimeKeeper Grandmaster. On a software TimeKeeper installation the event name would be ‘tk.event.login’, and in both cases if the login failed ‘successful’ would be replaced with ‘attempted.’

The following events are recorded in addition to the configuration changes reported above.