< Incidents

Incidents/20160925-ores

Summary

At September 25th, ORES service had higher ~14%) timeout ratio for six hours. Because it ran out space due to too verbose logging.

Timeline

  • Sept 25 10:34:40 UTC 2016: icinga test on ORES failed due to timeout.
  • 14:13 UTC: phab:T146581 is created.
  • 16:03 The fix deployed in labs.
  • 16:26 The fix deployed in prod.

Conclusions

We should have better monitoring disk space and be careful on verbosity of production services logs

Actionables

This article is issued from Wikimedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.