Ncmonitor

ncmonitor
Location: eqiad
Status
Overall: Active
Icinga?: host status services status
Hardware
Software

ncmonitor keeps consistency between our registered non-canonical domains and services such as acme-chief and ncredir. Over the years the appropriate manual toil has been neglected, causing drift between it all. ncmonitor helps keep this all in sync by automatically detecting drift and proposing patches. Ultimately, this means that only MarkMonitor needs to be maintained.

Automation of registration is beyond the scope of this utility.

Configuration

Configuring and running the utility is documented in the project's manpages.

Our deployment configuration logic lives in the usual Puppet module and profile manifests.

Reviewing patches

Dependency graph with arrows and text
Patches need to be merged in a proper order.

ncmonitor patches are submitted to Gerrit at the same time; however, care must be had to merge patches in the right order. Services depend on each other in a linear fashion: ncredir requires acme-chief to have issued a certificate; acme-chief requires the domain to be configured in our DNS repository; The DNS repository requires properly-configured MarkMonitor domains.

ncmonitor will not submit patches for improperly-configured MarkMonitor domains.

Running

ncmonitor runs on its own Ganeti VM (ncmonitor1001.eqiad.wmnet) on a systemd timer. The service is entirely stateless: The process runs with temporary directories that are cleaned up after the service runs.

ncmonitor can be run to either simply print out required actions or to automatically submit patches to Gerrit for human approval. The routine service execution is set to automatically submit the patches.

Wikimedia has their MarkMonitor API usage limited to the production IP range: This utility must run in the production cluster.

Implementation details

ncredir

ncredir's nc_redirects.dat file is kept alpha-numerically sorted for organization. Humans aren't expected to edit the file very often, and the automated nature of appending to the file makes organization impossible.

acme-chief

acme-chief is split off of Puppet's usual hieradata/common.yaml file and instead resides in its own file. This is because PyYAML mangles formatting and eats comments - It's easier to just let PyYAML take over formatting of the file.

acme-chief's hieradata (The certificates::acme_chief blocks) are grouped into a maximum of 40 domains per certificate. For more information on why, see acme-chief.

See also

This article is issued from Wikimedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.