Compliance database interaction

This section covers the database used by Compliance, tools to query it, and how to use those tools. The examples here are intended to be starting points only, not an exhaustive list of all fields and query methods. The schema is designed to be simple to use out of the box without requiring users to be database experts. The general schema details below will get you connected with the data you need so that you can build the queries you want as quickly as possible.

Note this is for users interested in querying the database directly for their data. This same data is automatically provided in Compliance audits, PDFs, Excel spreadsheet outputs, etc., so users don’t have to go to the database directly for data. Direct access to the database is provided and documented here for those users that want to build their own queries and audit summaries.

Compliance database - SQLite

Compliance uses SQLite to store generated audit data. This is a very lightweight and fast database with easy access as a user. There are no other services to run or manage, Compliance writes directly to sqlite .db files in the specified log directory, /var/log by default:

${LOGDIR}/timekeepercompliance/db

Note that as mentioned previously, backups should be made of this directory for protection in the event of system failures.

Within this directory will be one SQLite database file per year Compliance has stored data for (2018.db for example). In addition, there will be logrotated copies, as many as are configured to be retained in TimeKeeper’s logrotate file.

Each .db file is where TimeKeeper Compliance stores results for that year and can be queried directly. Users can open and query that file according to normal Linux permissions. Because these files are self-contained, they can be copied elsewhere if needed, for offline querying or integration.

In many cases command line queries with the tool sqlite3 is used as we’ll see below. However there are many bindings for SQLite and this document isn’t meant to cover them all. Users can query the database with whatever tools make sense for their environment. As a precaution copies of the database may be made and queried against elsewhere to minimize interference/risk to the production .db files that are being updated with new data.

Compliance schema

TimeKeeper Compliance uses a contained schema that separates data by individual report/audit. This simplifies management and speeds database interaction. To understand the Compliance schema, it’s helpful to keep a couple terms and ideas in mind:

Reports are what get configured in Compliance to identify a group of hosts with a set of specific thresholds and parameters to meet. (For example, COMPLIANCEREPORT0 in your timekeeper.conf file might identify MiFID II hosts and applicable thresholds in London.)
Audits are instances of that configured report in your Compliance database. If COMPLIANCEREPORT0 above is a configured report to collect data on hosts in London, the result of what was reported on a specific day/week/etc is an audit.

Compliance stores these audits, each created from a configured report, in individual tables within the database for the year the audit data covers (such as 2018.db in the example above). Each audit table is specific to that report number and to that day/week/month/year.

As we’ll see, there is a top level table with details about individual audits. Below that are the individual audit tables with details about the clients in the audit period. Depending on the type of audit there may be additional lower level tables that provide specific additional details about aspects of the audit. We’ll start at the top level and work down from there.

Once an audit is complete it’s listed in that year’s .db file in a top level audit_names table. This table is the entry point about a specific audit and period of time. It documents the parameters used at the time of audit generation even if the configured report parameters have changed since. Data includes the following fields and types that may be relevant to users:

name (VARCHAR) - Name of the table that contains actual audit data
type (VARCHAR) - Type of audit - daily, weekly, monthly, yearly
title (VARCHAR) - Title of generating report as it was at time of audit creation
clientset (VARCHAR) - Clientset of generating report as it was at time of audit creation
warningthreshold (REAL) - Warning threshold as it was at the time of audit creation
minwarninglength (REAL) - Minimum warning length as it was at the time of audit creation
alertthreshold (REAL) - Alert threshold as it was at the time of audit creation
minalertlength (REAL) - Minimum alert length as it was at the time of audit creation
mingaplength (REAL) - Minimum gap length as it was at the time of audit creation
endtoendaccuracy (REAL) - End to end accuracy option as it was at the time of audit creation
onlyreportprimary (REAL) - Only report primary option as it was at the time of audit creation
report (INT) - Report number this audit was for
start (INT) - Start time of audit (in epoch seconds)
end (INT) - End time of audit (in epoch seconds)

Each audit named in audit_names will have its own separate table in the database with details about that particular audit for whatever time period the audit covers. The name of the relevant table will be the name field above. This table has the details about the clients for the audit in question. All audit tables will have the following fields, regardless of type:

ip (VARCHAR) - IP address of a particular client
hostname (VARCHAR) - Hostname of that client as of audit generation
source (INT) - Client’s source number reporting the offset data. TimeKeeper clients can provide many different sources
max (REAL) - Worst offset value in the audit period (for this source)
max_time (REAL) - Time of worst value in the audit period
warning_time (REAL) - Time spent in warning in seconds
alert_time (REAL) - Time spent in alert in seconds
gap_time (REAL) - Time spent in gap in seconds

Daily audit tables have the additional fields beyond those above:

start (INT) - Start of data for this day (in epoch seconds)
end (INT) - End of data for this day (in epoch seconds)
time_source (VARCHAR) - Source of time at the point of the worst recorded value
sample_count (INT) - Number of samples for this client during this period

In the case of daily audits, the individual details about specific threshold violations and gaps are also stored in related tables. Alert thresholds are stored in a table of the same name as the daily audit with _alerts at the end of the name. Similarly, warning threshold violations are also stored in a table matching the name of the daily audit, but with _warnings at the end of the name. Gaps noted in the audit are also stored in a separate table with the same name, with _gaps in the name. Details on these tables are below.

Whether a violation and its details are recorded depends on the configured alert and warning thresholds. The threshold that was used in generating this audit is recorded in the audit_names table for this audit, in the warningthreshold and alertthreshold columns. The alerts and warnings violations tables contain the following fields, detailing the specific alert and warning violations (depending on the table) found for each client:

ip (VARCHAR) - IP address of the client with the issue
source (INT) - Source number that exceeded the set threshold
start (REAL) - Start of violation (in epoch seconds)
end (REAL) - End of violation (in epoch seconds)
max (REAL) - Worst value of excursion
max_time (REAL) - Time of worst value above in max
time_source (VARCHAR) - Source being used for time when the worst value was reached
uac (REAL) - Upstream/end to end source accuracy at max_time, if known

Gaps may be recorded depending on whether the client’s missing data exceeded the gap length in the configured report that generated the audit (default is 180 seconds). The value used will be recorded above for this specific audit in the mingaplength field in the audit_names entry for this audit. The gaps table contains the following fields:

ip (VARCHAR) - IP address of the client with the gap
source (INT) - Source number that experienced the gap
start (REAL) - Start of gap (in epoch seconds)
end (REAL) - End of gap (in epoch seconds)
time_source (VARCHAR)- Source being used for time at the start of the gap

Sample queries

The detail above shows the data structure to enable querying but an example generally makes things more clear. Let’s go through the process of getting details on a specific client in a particular day’s audit from a report for a specific day, from configuration to individual threshold violation data. In this case let’s look for alert threshold violation data for 10.13.153.5 (hostname ukslomfd2trd13) related to configured report 0 on June 11 2018.

First let’s start with the configured report in /etc/timekeeper.conf that will generate the audit. Using a pretty minimal set of parameters the report might look like this:

COMPLIANCEREPORT0 () { TITLE='Demo report'; CLIENTSET='ukslo*'; WARNINGTHRESHOLD='.000020'; ALERTTHRESHOLD='.001'; }

This will trigger Compliance to build daily audits for any host matching the name ‘ukslo*’, and record detail like gaps exceeding the default gap length, alerts when clients exceed 1ms, and also report warnings when the clients exceed 20us. With this configured report in place, Compliance will generate audits that can be listed with the compliance_query tool, shown here with shortened output:

/opt/timekeeper/release64/compliance_query --list 
...
2018    daily       0  1528761600  1528848000  (Tue, 12 Jun 2018 00:00:00 - Wed, 13 Jun 2018...
2018    daily       0  1528675200  1528761600  (Mon, 11 Jun 2018 00:00:00 - Tue, 12 Jun 2018...
2018    daily       0  1528588800  1528675200  (Sun, 10 Jun 2018 00:00:00 - Mon, 11 Jun 2018...
2018   weekly       0  1527984000  1528588800  (Sun, 03 Jun 2018 00:00:00 - Sun, 10 Jun 2018...
...

Let’s get data on the daily audit for report 0 starting on Monday June 11 2018 by querying the top level audit_names table described above. sqlite3 can give you any of the fields about this audit’s configuration and summary:

$ sqlite3 /var/log/timekeepercompliance/db/2018.db
sqlite> .headers on
sqlite> select name, type, clientset, warningthreshold, alertthreshold from audit_names where start = 1528675200 and report = 0 and type = 'daily';
name|type|clientset|warningthreshold|alertthreshold
daily_1528675200_0|daily|ukslo*|2.0e-05|0.001
sqlite>

Again, you can use other tools to query the SQLite database, sqlite3 is used here as an example. Likewise, you can query the audit_names table directly to get a list of audits and all their parameters at once and use that to create later queries, we’re just starting with compliance_query’s results as an example.

From the above we can see some of the summary data including the thresholds used in creating the audit (20us warning, 1ms alert), the clientset (‘ukslo*'), and so on. The name field is the name of the table containing the individual client results for this audit. We can now query that for specifics about the hosts flagged in the audit:

sqlite> select ip, hostname, source, max, max_time, alert_time, warning_time from daily_1528675200_0;
ip|hostname|source|max|max_time|warning_time
10.13.153.5|ukslomfd2trd13|0|0.001344818|1528718579.4396|8033.743599914|8033.743599914
10.13.154.23|ukslomfd2trd17|2|0.001115007|1528682525.7795|8010.124199621|8010.124199621
10.12.17.6|ukslomfg2trd77|1|0.001296703|1528704209.9894|8202.531400248|8202.531400248
...
sqlite>

The list of hosts in this audit is truncated but 3 entries are shown with their IP address, hostname, the source that had the issue, the maximum offset recorded, the time of that offset, and the amount of time spent in alert/warning for this day.

This is still a high level view - it shows which clients had issues, for how long and what their worst case value was. Other data is available if needed like the source for time when the issue occurred and when exactly the worst case offset happened. You may need to get details about individual violations beyond the alert threshold, like when they started and ended for a specific host. To get that, we go further to the alerts table for this day, which is named the same as the daily table but with _alerts at the end.

To get details about the alerts for 10.13.153.5, that query would look like:

sqlite> select start, end, max, max_time, time_source from daily_1528675200_0_alerts where ip = '10.13.153.5';
start|end|max|max_time|time_source
1528682400.9916|1528682750.2848|0.001344647|1528682677.419| NTP(10.13.18.4:10.13.10.66)
1528678800.4958|1528679149.789|0.001343727|1528679144.006| NTP(10.13.18.4:10.13.10.66)
1528686001.4874|1528686350.7806|0.001338525|1528686131.0266| NTP(10.13.18.4:10.13.10.66)
1528693202.479|1528693551.7722|0.00134383|1528693310.0428| NTP(10.13.18.4:10.13.10.66)
1528689601.9832|1528689951.2764|0.001343271|1528689916.5784| NTP(10.13.18.4:10.13.10.66)
1528696802.9748|1528697152.268|0.001344696|1528696815.6974| NTP(10.13.18.4:10.13.10.66)
...
sqlite>

This list is truncated but we can see when each alert occurred in time, from the start to the end, what the worst case offset was, when that occurred, and what time source it was using at that point. This can be helpful not only for providing clear reports, but also for identifying common timing issues. Here all of the alerts were triggered when the client was getting NTP from 10.13.18.4, and that may indicate an issue with that NTP server. TimeKeeper’s timing map, covered in more detail in the “Timing map and network view” section, can provide insight about that source and identify possible issues.

Likewise, the client’s warnings and gaps are in similarly named tables based off of the daily table we started with above.

Again, this interface is provided for those who want to access the data directly to build their own queries and collect their own data. For many users the automatically generated audits available via the web interface, PDF, and Excel output are sufficient but direct SQL access is also provided.

Managing data in the database

Direct modification of the database is not supported and will affect Compliance’s ability to provide valid audit data. If audit data is incorrect/incomplete and a specific audit should be rebuilt, please use the compliance_cli tool documented in the section, “compliance_cli,” in order to regenerate that audit.

Archiving older audit data

Compliance retains all databases and PDFs indefinitely, leaving archiving policy up to the user to define. If you would like to archive older PDFs and databases to an alternate location for long term storage or backup, the process is simple:

Select the year(s) of database files to archive
For those same years, archive the same set of PDF files
Use compliance_cli to rescan for databases, or restart TimeKeeper/Compliance entirely.

For example, if your policy requires archiving 2018 data to offline storage in the path /remote/archive this process would look something like (as root):

# cd $LOGDIR/timekeepercompliance
# mkdir -p /remote/archive/compliance/2018/db
# mv db/2018* /remote/archive/compliance/2018/db/
# mkdir -p /remote/archive/compliance/2018/pdf
# mv pdf/2018/ /remote/archive/compliance/2018/pdf/
# /opt/timekeeper/release64/compliance_cli --rescan

Once this is complete Compliance will not provide audits with data from 2018 either directly from the database or via the PDF reports. Later if these are required to be served from Compliance directly the process can be reversed.

If you have any questions about how to manage or query the Compliance data you have, please contact support@fsmlabs.com.