We are excited to announce the release of CHT Watchdog, a new and powerful open-source solution for monitoring CHT deployments and receiving push alerts when attention is needed from a CHT administrator (see system monitoring and notification). This comprehensive monitoring tool, developed by Medic, is now available for installation on any instance above CHT-Core version 3.12. Monitoring – usually tied to an alerting system – notifies key support people if any critical incidents occur. This real-time reporting allows administrators to remedy the situation and avoid further issues quickly. This post dives into the software that powers CHT Watchdog and the benefits of running it.

Open-Source Excellence at its Core
When developing CHT Watchdog, Medic was careful to not use proprietary software that may be either hard to set up or have large licensing fees. Instead we selected Grafana for the front end and Prometheus for the back end. Both are entirely open source software (OSS) are not only free for deployment, but also come with long-term support guarantees, even if the companies behind them cease operations. In addition to OSS, they have long since been the go-to solutions for system administrators who need to easily configure dashboards that they can use to push out actionable alerts.
While interviewing administrators of existing CHT deployments, more than one was found to already be running Grafana and Prometheus. Hearing that expert administrators had already selected these tools validated our choice to use the software. As well, in these interviews we were able to validate, and course correct, a lot of our assumptions about how the best CHT Monitoring and Alerting system should work out of the box.
Being “best in class” means that a solution doesn’t just work for one use case. CHT Watchdog can be extended beyond monitoring the CHT to also work to monitor the Docker containers, a great addition as they power every self-hosted CHT instance. The extensibility is not limited to CHT and Docker, but to any of the hundreds of solutions that Prometheus and Grafana work with.
Easy to Set Up, Works Out of the Box
Given that every self-hosted CHT instance is already running Docker for both CHT versions 3.x and 4.x, CHT Watchdog leverages Docker technology to provide administrators with a seamless installation experience by allowing admins to get started with a single command line. Just as important as using Docker, every CHT instance after 3.12 also has the monitoring API built in.
After deploying CHT Watchdog, all the dashboards in Grafana are already configured to ensure every single metric published on the monitoring API is captured. As well, sensible alerts are already configured which cover all the critical areas where the CHT Core may need the attention of an admin. These include built in alerts for:
- couch2pg backlog – How far behind the syncronization of CHT data in CouchDB to Postgres is. Postgres is often the source of MoH facing health dashboards.
- Outbound Push Backlog – Outbound push is used for sending interoperability and other integrations.
- Sentinel Backlog – Sentinel is the server side process to handle data transitions and updates.
- Server Time Accurate – The time being off will affect when timestamps show up for care given, sentinel process, outbound push and all other server side activity.
- Users Over Replication Limit – It’s important to ensure users don’t have too many documents on their device which may adversely affect performance.
- API Server Down – This means the CHT is down and can not process any data and no CHWs can synchronize.
- Client Feedback/Error Rate – CHWs can either manually submit an error/feedback of the CHT Android app may programmatically generate feedback docs in case of an error.
- DB Conflicts Rate – If a single document get’s updated by two CHT users at the same time, a conflict occours which will need manual intervention to fix.
- Message Delivery Rate – SMS messages status including success, failures and pending.
Any additional alerts that deployments need can easily be added to ensure each of the unique set ups of the CHT can be kept healthy.
Actionable Alerts, Tested in the Field
The dashboard below demonstrates a real-world scenario from a production instance of the CHT hosted by Medic. Note the significant increase in the Sentinel Backlog as well as some gaps when Watchdog was unable to capture any data from the CHT Core instance. Sentinel is a CHT Core service that handles server-side processing which keeps data in the CHT up to date. A backlog here will prevent current data from reaching health workers:

The example above is just one of many real-world situations in which Medic has demonstrated the CHT Watchdog to be successful at not only detecting a potential outage, but catching it before health workers notice –the primary reason that Watchdog was created. By getting timely, actionable alerts, CHT Administrators can remediate issues and minimize downtime.
Watchdog is already on version 1.8.2 despite being less than five months old. This continuous improvement is based on real-world insights gathered from monitoring numerous CHT instances.
Get Started
To get started with CHT Watchdog, visit our documentation site where you can find extensive guides on setup, production deployment, and customization. We offer detailed written documentation as well as technical videos that cover the monitoring API in the CHT Core and the initial setup process for running the CHT Watchdog in a production environment. Additionally, you can reach out to us on the CHT Forum if you have any questions or require assistance with your deployment.