NTP (Network Time Protocol) - Pitfalls and Annoyances

Klas Mattsson
Apr 25, 2024
3 min read

Updated: Jul 25, 2024

NTP is Network Time Protocol. We use the Network Time Protocol (NTP) to keep our time synchronized across our network. It's important because it ensures that all our devices are on the same time, which is crucial for coordination and consistency in our operations. While there are different implementations and protocols for time synchronization, they all address the same fundamental need: maintaining accurate time on our clients.

How can time be an issue?

Time can be quite perplexing. From leap years and time zones to leap seconds, not to mention the confusion that arises when different sources disagree, it can lead to a myriad of problems.

I won't assume you've watched it, but here's a fun video that sheds light on these complexities: https://www.youtube.com/watch?v=-5wpm-gesOY

What is the fundamental thing NTP does?

NTP does not directly sync time. Instead, the client requests the time from the server and adjusts its own timekeeping accordingly over time. Although you can manually sync time at intervals, this approach can lead to problems, particularly for time-sensitive applications (as explained below).

Critical use cases

Encryption

A lot of encryption relies on time, which means both parties need to agree on the time to a certain extent (though we're talking about differences in minutes here).

Inconsistent logs

When servers have differing times, it can complicate forensic analysis of log streams unnecessarily. Generally, being able to see within 0.01 seconds is optimal (without specialized solutions). Anything better than 0.1 seconds is usually acceptable, and better than one second is assumed by most systems.

Time-sensitive applications

Many applications, such as cluster databases, are extremely sensitive to time and demand consistency.

Hands-on tips with NTP

Number of servers

When it comes to NTP, the number of servers to communicate with is crucial. There are essentially only two viable options:

One server
Four servers

One server

This use case is actually quite common. The main reason behind this is the need for all servers within an environment to synchronize their time. Since NTP typically remains stable for a few minutes, employing just one server often suffices for most setups. Maintenance can then be carried out almost anytime without significant disruptions. Typically, this server itself synchronizes with other NTP servers, usually four in total.

Four servers

This scenario can be a bit tricky since we tend to lean towards the concept of quorum, which involves an odd number of votes. However, the issue with quorum is that it presumes a single truth being voted upon. When one-time server indicates 10:05, another shows 09:45, and so on, it's challenging to determine the correct choice—is it the middle ground or just one of them? Both options have their drawbacks. That's why having three-time servers is essential in such cases.

When two servers report 10:05 and one shows 09:45, it's straightforward to dismiss 09:45 as an outlier (though the client should still gradually adjust its clock to regain time, rather than hard-syncing). Hence, three servers are the minimum necessary to obtain reliable data. Adding a fourth server ensures redundancy—if one goes down, you still have three operational servers. Therefore, having four servers is the smallest set that offers redundancy and is evidently superior to having just one. You can explore more about managing redundancy in clusters with our Kubernetes Consultancy services.

Time on different machines

Virtual machines

Virtual machines often struggle with timekeeping for various reasons. When migrating, timers can become inaccurate, as these machines lack direct access to the host clock and rely on software for timekeeping. Additionally, they may lose time during reboots and encounter other issues. Some address this by letting the host dictate time, assuming it's trustworthy (though in such cases, running NTP on servers is ill-advised). Alternatively, many opt for NTP, which typically resolves these issues. However, there are occasional edge cases that can still cause problems.

To sum up, it's crucial to understand that running NTP servers on virtual machines is never recommended due to these inherent challenges.

Baremetal

Time on baremetal essentially relies on a clock running on the motherboard, often powered by a small non-rechargeable battery. The operating system (OS) can then synchronize time with this hardware clock. This setup functions adequately as long as the clock maintains reasonable precision—although, in some cases, clocks may drift significantly, such as one server I encountered that drifted about 9 seconds a day.

It's crucial for the OS to synchronize time with the hardware clock, as failing to do so can lead to significant issues during system boot-up, potentially causing discrepancies that disrupt time-sensitive applications.

Containers

Containers typically rely on the host for timekeeping, emphasizing the importance of trusting the host environment. If you ever contemplate running NTP servers on Pods, you might want to reconsider that idea.

Stakater Blog

Follow our blog for the latest updates in the world of DevSecOps, Cloud and Kubernetes