Notice: The WebPlatform project, supported by various stewards between 2012 and 2015, has been discontinued. This site is now available on github.

Logging aggregation and analytics

Summary

In order to get as much information on the system as possible, we have to aggregate log events.

Current state is using UDP as a way to transfer log data and is not reliable by nature of the protocol, it is acceptable to drop packets. Not acceptable for log messages.

See This blog post about Centralized logging by Joson Wilder to understand the idea behind.

Also, see this presentation: Logstash and other things by Jordan Sissel of Dreamhost

Related tasks

Overview

An ideal system should:

  1. Accept messages from all the nodes and their services
  2. Use some FIFO or queue to make sure we do not lose messages or overflow the internal network traffic
  3. Provide a web-based interface to search events
  4. Index all log messages and parse known elements such as date formats, and categorize by type of service
  5. Be open-source, and hosted within our own infrastructure

Found:

  1. LogStash
  2. Graylog2
  3. Scribe

Data sources

  • Salt stack minion log_file parameter
  • Apache2 in every vhost ErrorLog syslog:local and php_flag log_errors on
  • NGINX in every vhost
  • Local syslog service to forward, configure message queue
  • Add hooks in some web apps
    1. MediaWiki hooks [1]
    2. BugGenie
    3. WordPress

Reference

Articles and tutotials