I’ve started trying to use the mircvs://contrib/code/heartbeat/ stuff to monitor NTP timedeltas between my boxen and a reference box (some random Stratum 2 pool server I do not use as server in any of the boxen, otherwise I might have used the PTB servers). Add rrdtool and rrdgraph output. Maybe mail when the boxen are down, until we have company monitoring set up?
I wonder if I should do it the “right” way instead of the “little effort” way, then commit it? Including cleaning up the age-old code. Is there any interest?
On a side note, we need a monitoring and management system, either one tool or integrating a few. It should have a command line interface and a WUI, different web pages for admins and (read-only) users who can look there for the general system state before complaining. Also, we need configuration management. A few keywords: nagios, cacti, puppet, cfengine. These were thrown into the room. Does anyone have a complete solution, possibly with VM management (how much does OpenCRM do?), for which we currently use a homegrown Jabber bot (don’t ask…) which does the template cloning (zfs, iSCSI) and other setup. Other suggestions, tools to avoid, success stories, links, documentation welcome.