Problems of scale and complexity

Stanford Web Services, Stanford University
Aug 2016 to present

How do we manage so many sites, with so many variations but also interrelationships among codebases?

By the time I arrived, my team had already written over 1,600 tests.  We maintained over 2,500 sites that made use of myriad libraries, API’s, custom modules, and other integrations.  The chance of breaking something when upgrading anything was extremely high.  So these tests were incredibly valuable; they just took a really long time to run.  And required a very precarious setup on our local development machines to run successfully.  For example, the first time I ran all variations of our test suites, it took close to a week.  That Friday, Chrome pushed update that made it impossible to run any.

We knew we needed a system for running these tests automatically, but we didn’t know when they should be run and where.  Before getting started, I wrote a project brief and began interviewing members on the team who could tell me what errors clients cared about most.  The answer was simple, anything that makes a site “look broken.”  It could be text running unintelligibly over an image.  It could also be a button that didn’t work.  With these priorities in mind, I looked at our development practices.

For the most part, development occurred in well contained modules.  Once new development reached a certain level of stability, we would add it to our products.  And only after rigorous testing, would new code deploy to production.  So there were a number of opportunities to run subsets of our test suite, specific to the code being developed.  We didn’t have to run all 1,600 every time.  And if we could run tests earlier in the development process, then the results could support a peer review process integral to our team’s development practices.

A number of different tools exist to facilitate this kind of automation.  Each with varying costs, quality of documentation, and ease of adoption.  With an open mind, I explored several and carefully noted where I found friction, where things worked the first time, and where companies tossed in delightful add-ons that showed they really understood how the automation tool would be used in real life.  For example, one had a favicon that changed color depending on whether a test run was in process, whether it had succeeded, or failed.  They wisely presumed I would be working on something else, while tests ran in a backgrounded tab.

I distilled my lessons learned from this evaluation process for my colleagues and demoed several different options.  I wanted to hear from them whether they’d find this addition to our development process useful.  Anything that threatens to slow down the movement of code to production has to return a hugely high value.  And I didn’t want anyone getting frustrated and working around it.  After all, these tests were our best chance to catch problems before they reached the public.  Assured my colleagues all believed it would add value, I worked through the last remaining challenges and we piloted the tool on a small handful of repositories in active development.

The results were humbling.  Not only did it reliably catch errors, but my colleagues were ecstatic to see the tests run more often with no more work on our part.  We celebrated the first time our system ran the right subset of tests for the 12 different products we managed, a process that would have previously taken up someone’s computer for 12 hours if not more.  When the tool has broken, since we started using it, my colleagues have begged me to get it back online as soon as possible.  They are not as I had feared eager to work around it.  And have been requesting additional features so they can use it in new contexts.

Resources

Code: https://github.com/su-sws/stanford_travisci_scripts

Presentation at DrupalCon Higher Ed Summit: https://github.com/kbrownell/automated_testing

Kellie