Notice: The WebPlatform project, supported by various stewards between 2012 and 2015, has been discontinued. This site is now available on github.

March 2015 Infrastructure change log since 2013

List of high level Infrastructure changes we’ve worked on in the last two years.

Should cover;

  • Which individual software we run, pieces involved, what needs each piece fulfills
  • Software we got rid of (e.g. web based IRC client, GlusterFS, etc)
  • New components
  • We weren’t assured of long term free hosting form HP, thankfully DreamHost took charge of us
  • Automation improvements
  • Overview of lessons learned, and wins

On a related note, there are also notes in the 201410 report published last year and in the 2014 Improvement plan page.

Software we currently use

Web applications we host

Resource Software Deployment repository Location
Home page [DocPad](https://docpad.org/) [repository](https://github.com/webplatform/www.webplatform.org) [webplatform.org/](http://webplatform.org/)
wiki ([version](/Special:Version)) [MediaWiki](https://www.mediawiki.org/wiki/MediaWiki) using Wikimedia Foundation ("wmf/1.24wmfX") continuous release branches [repository](https://github.com/webplatform/mediawiki) [docs.webplatform.org/wiki/](/docs/)
IRC logger [Lumberjack (now called Pierc)](http://classam.github.io/pierc/) TODO [www.webplatform.org/talk/chatlogs/](/talk/chatlogs/)
Analytics [Piwik](http://piwik.org/) TODO [stats.webplatform.org/](https://stats.webplatform.org/)
Blog [WordPress](http://wordpress.org/) [repository](https://github.com/webplatform/blog-service) [blog.webplatform.org/](/blog/)
Code sandbox [Dabblet](http://dabblet.com/) [repository](https://github.com/webplatform/dabblet) [code.webplatform.org/](http://code.webplatform.org/)
Project management [The Bug Genie](http://www.thebuggenie.com/) [repository](https://github.com/webplatform/thebuggenie) [project.webplatform.org/](https://project.webplatform.org/)
Accounts [Firefox Accounts](https://wiki.mozilla.org/Identity/Firefox_Accounts) see [SSO Project page](/WPD/Projects/SSO) [accounts.webplatform.org/](https://accounts.webplatform.org/)
Hypothes.is [Hypothes.is](https://www.hypothes.is/) TODO [notes.webplatform.org/](https://notes.webplatform.org/)
Discuss [Discourse](http://www.discourse.org/) TODO [discuss.webplatform.org/](https://discuss.webplatform.org/)
### Web application that we rely on
Resource Software Location Usage
Operations issue tracker GitHub "*webplatform/ops"* project and [Hubboard](https://huboard.com/) [*KanBan* dashboard](https://huboard.com/webplatform/ops/#/), [webplatform.github.io/ops](https://webplatform.github.io/ops) A Dashboard utility to visualize in columns what’s next, what´s in progress, and what’s done.
WebPlatformDocs GitHub account GitHub [gist.github.com/WebPlatformDocs](https://gist.github.com/WebPlatformDocs), [github.com/WebPlatformDocs](https://github.com/WebPlatformDocs) Store all *code.webplatform.org* gists in GitHub. This account was created before GitHub introduced "organizations".
WebPlatform GitHub organization GitHub [github.com/WebPlatform](https://github.com/WebPlatform) An organization in which we store all repositories
### Misc.
Ubuntu
We were originally running on two different versions of Ubuntu 10.04 and 12.04. While both are “Long Term Support” we weren’t running on the same version for every servers, and also had no automatic install of security updates. Since mid-2014 all servers uses the same version — Ubuntu 14.04 LTS — and automatic security updates
Memcached
A “key store” system that many web applications relies to keep the HTML that it generated, speeding up the page render.
Redis (new since 2014)
Another “key store” system similar to Memcached, but in use for storing session data so we can balance web application backend load across multiple web servers. Benefit of Redis over Memcached is that we can easily make Redis calls through SSL/TLS and require clients to authenticates. Also, we can configure MediaWiki to store async jobs over to Redis instead of a MySQL table. MediaWiki operations team lead said that tables is fine for heavy load. Unless we get wikipedia.org type of load.
ElasticSearch (new since 2014)
A “REST” web application on which we can index documents, and use as a search engine. We store annotations in it, and so we could configure MediaWiki so we improve search capabilities.
MariaDB (new since 2014)
A drop-in replacement to MySQL, in use by most of the infrastructure. We could also configure it to use Galera so we could send writes to any node in the cluster. Might be a next step.
Monit (new since 2014)
A system that is made to help ensure vital services are up.
NGINX (new since 2014)
A Web Server and Proxy software. We will eventually use NGINX instead of apache in a near future
Apache
The original web server software. Only MediaWiki has an hard requirement to it, until we change our configuration or that Semantic MediaWiki ("SMW"’) requirement changes. At this time, SMW requires PHP to be run through mpm-prefork which is, in contrast to multi-threaded servers, is asking to always keep up a set of processes to answer HTTP requests. It is said in SMW mailing list, this hard requirement might change soon.
php-fpm (new since 2014)
A PHP execution environment that NGINX connects to to server dynamic pages. Currently in use with Piwik, other PHP web applications (except MediaWiki) could be migrated to it soon enough. Unless we could run everything that runs in php5-fpm to HHVM (see below) instead.

Software we are currently evaluating to use

Nutcracker
A “key store” proxy system that we could install on each app server making a local copy and balancing the load accross both Redis and Memcached.
HHVM
A complete rewrite of the PHP execution environment with known improvements compared to php-fpm. We might migrate all compatible PHP web applications to this runtime environment.
LogStash
A centralized log manager. It harmonizes, archives and process any log messages we send to it. Serves as an easy to use log search engine
Phabricator
A web application made to help organize software projects. It features an IRC bot from which we could monitor more than one IRC chat room (our chat bot is not maintained and doesn’t scale well). Also, we could Phabricator as a place to: store sensitive documents, mirror GIT/SVN/Mercurial repositories we rely on, host temporary Git/Mercurial stashes so we don’t risk losing them around, code “pastebin” so we don’t need to rely on GitHub gists, etc. See Phabricator applications list for more details
Docker
A system that allows us to create “executable” packages called “containers” out of any software or web application. At a first look, it looks like a VM, but its a very thin one that only does one job and removes the need to upgrade operating system packages and allow to roll back from any previous builds that had been made
Deis
An “orchestration” system that automates the cycle of building Docker containers. It automates handling of subsystems such as load balancing, remove the need to hardcode web application related deployment scripts, archiving, etc. With this, we could build on push anything that has a "Dockerfile" at the root of the project.
CoreOS
A thin Linux distribution that has Docker preinstalled and a few other orchestration utilities to provide auto-discoverability and automatic scaling.

Conventions in place

Idea is that any service use default configuration as if its local, use of equivalent service to delegate/proxy to specialized set of servers

What’s common on any VMs
Refer to architecture documentation at Base configuration of a VM
email
Each server uses localhost as their email server gateway, but the local email server then uses a specialized VM only for sending emails
memcached (still in evaluation)
Each application server (i.e. a server that runs a web application backend technology such as PHP or Python) acts as if Memcached is local, but in fact a service called Nutcracker (a.k.a. TwEmProxy) is configured to talk to more than one Memcached servers serving the purpose of keeping a local copy of the data.
secrets, passwords (“accounts” pillars)
Are stored in a separate set of (salt) pillars in /srv/private/pillar/accounts/production.sls and (will be) hosted in a private Git repository at W3C. In there, we keep all API secrets, tokens, and passwords we need. Every configuration and services pulls information from there. To adjust, edit and commit the file then run a "highstate". (staging is at '/srv/private/pillar/accounts/staging.sls)
Deployment sensible variables… (“infra” pillars)
Are stored in a centralized (salt) pillars in /srv/pillar/infra/production.sls. In there, we list private IP addresses of database servers, ElasticSearch, etc. Every configuration file and states relies on it. To adjust, edit and commit the file, then run a "highstate". (staging is at '/srv/private/pillars/accounts/staging.sls)
ssh access
The only way to work on any is to pass through the salt master as a "jump box", see Accessing a VM using SSH
which level a VM is in?
The “level” grain exists to tell which configuration file to use in both /srv/pillar/infra/$level.sls AND in /srv/private/pillars/accounts/$level.sls. To get which level, you can ask from the terminal salt \* grains.get level. Refer to architecture documentation at Roles and Environment level

Lessons learned

Varnish
Refer to Caveats in Things to consider when we expose service via Fastly and Varnish
Use IP address instead of names, in web application configuration files
Name resolution can be costly. I learned that it speeds page render time when I explicitly set Database, Redis, Memcached, ElasticSearch IPs instead of names that we’d put in /etc/hosts. Since we now manage every configuration file through salt since late 2014, we can now update easily this information automatically
To support multiple runtime backends …
Its much easier to configure a public Front-end server to use directly a private-access-only backend that handles when to serve static files it already hosts and which files should pass through FastCGI, etc. Front-end nodes wouldn’t need to duplicate that part, but be responsible to balance the load and take care of both static files and in-memory page cache. Much like Varnish does.

Work done per year

2013

  • Started working on WebPlatform on July 2013
  • Setup analytics solution w/ Piwik, at stats.webplatform.org
  • Only one set of Virtual Machines (VMs) exposing live site, no room to work on improvements without risks of affecting live site
  • Deployment scripts were assuming exactly one deployment, making it hard to do gradual roll out
  • In configuration files, every IP Addresses were scattered around and I had to search around to adjust changes so that the servers would get the information updated
  • VMs were running with two different OS versions: Ubuntu 10.04 and 12.04, no automatic installation of security patches
  • Setup of a private OpenStack at DreamHost ("DHO"; DreamHost OpenStack) cluster from a 4 blades server, server was lent by DreamHost
  • Server migration from HP Cloud into DHO 2013 Migrating to a new Cloud Provider
  • Work on MySQL cluster so we could have off-site hot backup (i.e. database replication on a remote site, with logs transferred through SSL)
  • Multiple outages due to a bug in MediaWiki+Semantic MediaWiki affecting the rest of the infrastructure
  • Complete rework of MediaWiki installation. Originally it was a clone with bits and pieces pasted without source control to a scripted setup exclusively based on source controlled repositories
  • Work with Doug to create a new Compatibility data JSON schema
  • Sprint on extracting compatibility data from MDN into new Compatibility JSON schema
  • Sprint on Compatibility tables extension
  • Removed requirement of shared storage across VMs (GlusterFS) and switched to use external DreamObjects storage (Swift) at DreamHost
  • Set in place image storage pulling files directly from DreamObjects

2014

  • Upgraded all VMs to use only Ubuntu 14.04
  • Pages are now served under SSL
  • Most of the work mentioned below until 2015 is also noted in 2014 Improvements plan
  • Refactor of the homepage
    • Originally from a set of PHP files to DocPad and other NodeJS based scripts to make the homepage completely static
    • see in Projects/Homepage page
  • Setup of our own email exit relay so that every server uses it instead of a paid external provider
  • Deployed notes.webplatform.org w/ Hypothes.is
  • Worked on implementing “SSO” based on a shared session token key, a “profile” server would be used as a source of truth, client web app (i.e. MediaWiki) would either create a user based on the details, or start a session
  • Upgraded MySQL server version, migrated to MariaDB (Open Source version fork from the original author of MySQL)
  • Creation of an account management service
    • Based on Mozilla Firefox Accounts (“FxA”)
    • Created our own fork, changed branding
    • Implemented Proof of concept of SSO using FxA for MediaWiki
    • see SSO Project page
  • Rework of how we deploy which now reads from a specifically crafted git repository and pull any plugins/extensions and gets configuration automatically applied
    • Piwik (stats.webplatform.org),
    • WordPress (blog.webplatform.org)
    • MediaWiki (docs.webplatform.org)
    • The IRC bot,
    • BugGenie (project.webplatform.org)
    • Annotation service (notes.webplatform.org)
    • Accounts service (accounts.webplatform.org), many repos refer to notes at SSO Project page
    • Homepage (www.webplatform.org)
  • Purchase of an EV SSL certificate with mention of “World Wide Web Consortium” to give a hint of the site maintainers
  • Purchase of an alternate domain name to replicate in full (!!) the server setup allowing us to test in isolation every components of the site
  • Reviewed every blog posts, and imported any images we were linking outside of our site
  • “Inventory” system that keeps in memory the internal IP addresses of each infrastructure services: MySQL, Redis, Memcache, etc.
  • Generates automatically configuration file with credentials based on the servers that are up at that moment, IP address, Passwords, Private keys, etc
  • Capability to update passwords/private keys across all web applications from one “private” configuration file
  • Setup of a “private” configuration system stored in a git repo, see WebPlatform GitHub operations issue tracker, at webplatform/ops#145
  • We will eventually publish all our deployment scripts to the public, except the “private” data files. Ref WebPlatform GitHub operations issue tracker, at webplatform/ops#48
  • Setup an NFS mount point so that ElasticSearch instances can do backups. Reviewed idea of not using inter instance storage, at least limit it only in the case of of backups… until we can store ElasticSearch snapshots through Swift/DreamObjects too, see WebPlatform GitHub operations issue tracker, at webplatform/ops#120

2015

Soon?

Some notes that were gathered around that aren’t been tried yet.