NTPD issue - syncs then slowly loses ground
- by ethrbunny
RHEL 5 workstation. Has been running smoothly for years. I did a 'pup' recently and followed with a nice, cleansing reboot. Afterwards the system had some startup issues: namely MySQL refused to start. It just went "...." for 5-10 minutes before I did another boot and skipped that step (using 'interactive'). This was the only service that didn't wan't to start normally.
So now that the system is booted I've found that it doesn't want to stay in sync with the NTP master and after 48 hours is refusing any SSH other than root.
NTPD: this service starts normally and gets a lock on 4 servers. Almost immediately it starts to lose ground and now (after 3 days) is almost 40 hours behind. If I stop/start the service it gets the lock, resets the system clock and starts losing ground again. The 'hwclock' is set properly and maintains its time.
Login: when I (re)start the ntp server I am able to login normally. I assume this problem is due to losing sync with LDAP. This appears to be verified by LDAP errors in /var/log/messages.
Suggestions on where to look?
ADDENDA: Tried deleting the 'drift' file. After a bit it gets recreated with 0.000.
from /var/log/messages:
Jan 17 06:54:01 aeolus ntpdate[5084]: step time server 129.95.96.10 offset 30.139216 sec
Jan 17 06:54:01 aeolus ntpd[5086]: ntpd [email protected] Tue Oct 25 12:54:17 UTC 2011 (1)
Jan 17 06:54:01 aeolus ntpd[5087]: precision = 1.000 usec
Jan 17 06:54:01 aeolus ntpd[5087]: Listening on interface wildcard, 0.0.0.0#123 Disabled
Jan 17 06:54:01 aeolus ntpd[5087]: Listening on interface wildcard, ::#123 Disabled
Jan 17 06:54:01 aeolus ntpd[5087]: Listening on interface lo, ::1#123 Enabled
Jan 17 06:54:01 aeolus ntpd[5087]: Listening on interface eth0, fe80::213:72ff:fe20:4080#123 Enabled
Jan 17 06:54:01 aeolus ntpd[5087]: Listening on interface lo, 127.0.0.1#123 Enabled
Jan 17 06:54:01 aeolus ntpd[5087]: Listening on interface eth0, 10.127.24.81#123 Enabled
Jan 17 06:54:01 aeolus ntpd[5087]: kernel time sync status 0040
Jan 17 06:54:02 aeolus ntpd[5087]: frequency initialized 0.000 PPM from /var/lib/ntp/drift
Jan 17 06:54:02 aeolus ntpd[5087]: system event 'event_restart' (0x01) status 'sync_alarm, sync_unspec, 1 event, event_unspec' (0xc010)
You can see the 30 second offset. This was after about one minute of operation.