A few months ago, an engineer in a data center in Norway encountered some perplexing errors that caused a Windows server to suddenly reset its system clock to 55 days in the future. The engineer relied on the server to maintain a routing table that tracked cell phone numbers in real time as they moved from one carrier to the other. A jump of eight weeks had dire consequences because it caused numbers that had yet to be transferred to be listed as having already been moved and numbers that had already been transferred to be reported as pending.
“With these updated routing tables, a lot of people were unable to make calls, as we didn’t have a correct state!” the engineer, who asked to be identified only by his first name, Simen, wrote in an email. “We would route incoming and outgoing calls to the wrong operators! This meant, e.g., children could not reach their parents and vice versa.”
A show-stopping issue
Simen had experienced a similar error last August when a machine running Windows Server 2019 reset its clock to January 2023 and then changed it back a short time later. Troubleshooting the cause of that mysterious reset was hampered because the engineers didn’t discover it until after event logs had been purged. The newer jump of 55 days, on a machine running Windows Server 2016, prompted him to once again search for a cause, and this time, he found it.
The culprit was a little-known feature in Windows known as Secure Time Seeding. Microsoft introduced the time-keeping feature in 2016 as a way to ensure that system clocks were accurate. Windows systems with clocks set to the wrong time can cause disastrous errors when they can’t properly parse timestamps in digital certificates or they execute jobs too early, too late, or out of the prescribed order. Secure Time Seeding, Microsoft said, was a hedge against failures in the battery-powered onboard devices designed to keep accurate time even when the machine is powered down.
“You may ask—why doesn’t the device ask the nearest time server for the current time over the network?” Microsoft engineers wrote. “Since the device is not in a state to communicate securely over the network, it cannot obtain time securely over the network as well, unless you choose to ignore network security or at least punch some holes into it by making exceptions.”
To avoid making security exceptions, Secure Time Seeding sets the time based on data inside an SSL handshake the machine makes with remote servers. These handshakes occur whenever two devices connect using the Secure Sockets Layer protocol, the mechanism that provides encrypted HTTPS sessions (it is also known as Transport Layer Security). Because Secure Time Seeding (abbreviated as STS for the rest of this article) used SSL certificates Windows already stored locally, it could ensure that the machine was securely connected to the remote server. The mechanism, Microsoft engineers wrote, “helped us to break the cyclical dependency between client system time and security keys, including SSL certificates.”
Simen wasn’t the only person encountering wild and spontaneous fluctuations in Windows system clocks used in mission-critical environments. Sometime last year, a separate engineer named Ken began seeing similar time drifts. They were limited to two or three servers and occurred every few months. Sometimes, the clock times jumped by a matter of weeks. Other times, the times changed to as late as the year 2159.
“It has exponentially grown to be more and more servers that are affected by this,” Ken wrote in an email. “In total, we have around 20 servers (VMs) that have experienced this, out of 5,000. So it’s not a huge amount, but it is considerable, especially considering the damage this does. It usually happens to database servers. When a database server jumps in time, it wreaks havoc, and the backup won’t run, either, as long as the server has such a huge offset in time. For our customers, this is crucial.”
Simen and Ken, who both asked to be identified only by their first names because they weren’t authorized by their employers to speak on the record, soon found that engineers and administrators had been reporting the same time resets since 2016.
https://arstechnica.com/?p=1961136