Problems of scale and complexity

Stanford Web Services, Stanford University
Aug 2016 to present

How do we manage so many sites, with so many variations but also interrelationships among codebases?

By the time I arrived, my team had already written over 1,600 tests.  We maintained over 2,500 sites that made use of myriad libraries, API’s, custom modules, and other integrations.  The chance of breaking something when upgrading anything was extremely high.  So these tests were incredibly valuable; they just took a really long time to run.  And required a very precarious setup on our local development machines to run successfully.  For example, the first time I ran all variations of our test suites, it took close to a week.  That Friday, Chrome pushed update that made it impossible to run any.

We knew we needed a system for running these tests automatically, but we didn’t know when they should be run and where.  Before getting started, I wrote a project brief and began interviewing members on the team who could tell me what errors clients cared about most.  The answer was simple, anything that makes a site “look broken.”  It could be text running unintelligibly over an image.  It could also be a button that didn’t work.  With these priorities in mind, I looked at our development practices.

For the most part, development occurred in well contained modules.  Once new development reached a certain level of stability, we would add it to our products.  And only after rigorous testing, would new code deploy to production.  So there were a number of opportunities to run subsets of our test suite, specific to the code being developed.  We didn’t have to run all 1,600 every time.  And if we could run tests earlier in the development process, then the results could support a peer review process integral to our team’s development practices.

A number of different tools exist to facilitate this kind of automation.  Each with varying costs, quality of documentation, and ease of adoption.  With an open mind, I explored several and carefully noted where I found friction, where things worked the first time, and where companies tossed in delightful add-ons that showed they really understood how the automation tool would be used in real life.  For example, one had a favicon that changed color depending on whether a test run was in process, whether it had succeeded, or failed.  They wisely presumed I would be working on something else, while tests ran in a backgrounded tab.

I distilled my lessons learned from this evaluation process for my colleagues and demoed several different options.  I wanted to hear from them whether they’d find this addition to our development process useful.  Anything that threatens to slow down the movement of code to production has to return a hugely high value.  And I didn’t want anyone getting frustrated and working around it.  After all, these tests were our best chance to catch problems before they reached the public.  Assured my colleagues all believed it would add value, I worked through the last remaining challenges and we piloted the tool on a small handful of repositories in active development.

The results were humbling.  Not only did it reliably catch errors, but my colleagues were ecstatic to see the tests run more often with no more work on our part.  We celebrated the first time our system ran the right subset of tests for the 12 different products we managed, a process that would have previously taken up someone’s computer for 12 hours if not more.  When the tool has broken, since we started using it, my colleagues have begged me to get it back online as soon as possible.  They are not as I had feared eager to work around it.  And have been requesting additional features so they can use it in new contexts.

Resources

Code: https://github.com/su-sws/stanford_travisci_scripts

Presentation at DrupalCon Higher Ed Summit: https://github.com/kbrownell/automated_testing

Kellie
Problems of scoping down for shoestring budgets

Giant Rabbit LLC
July 2013 to Jun 2016

How do we provide the same quality of service but at half or a fourth the cost?

Dissatisfied with their existing toolset, nonprofit staff often came to truly believe another product would serve them better.  And so they embark on the trying, months long process of migrating their data and business practices to a new system.  That’s where we came in.  Having migrated dozens of organizations from one system to another, we had became intimately familiar with common pitfalls, oversights, and challenges with the process.  As the lead developer on many of these projects, my team trusted me with the freedom to iterate every single time.

When I helped the Electronic Frontier Foundation migrate to a new system, we scripted the process.  But that turned out to be extremely expensive; more so than what most other organizations could afford.  Next, I tried spreadsheets.  Replicating a relational database structure by manually moving columns into different files and files into different directories was rife with human error.  So, I dug a little deeper and found a tool called Open Refine.  It was specifically designed to handle complex data manipulation and it saw me through many successful migrations.  But the translation process was never fully replicable from start to finish.  Manual intervention was still required here and there.  Which meant, clients would be locked out of any system for weeks during a cutover.

Finally, I returned to the possibility of scripting a migration, but on a much smaller budget.  Instead of trying to script soup to nuts, I left out the most expensive part: automatically configuring the new system based on an organization’s existing data structure.  Not only was it the most expensive part of a scripted migration, but configuring the new system required reflection.  If automated, clients weren’t given the chance to re-examine how they had been using technical tools to support their work and whether their existing ways of working with technology was doing more harm than good.  If their old system was so bad they wanted to leave it, why recreate it?

With guidance from a colleague, I ventured beyond the standard languages used at our firm and into the wonderful world of Python.  A few videos and online tutorials later, I began writing out all the transformations required for this migration.  From the beginning, I hoped this application could be used again by other colleagues on other migration projects.  And so took time along the way to refactor, review variable name choices, rework data structures, add tests, update comments, and clarify logic.  We were able to practice the migration many times, and when the client handed us their final export for cut over, I had them back up and running in their new system by the end of the day.

Not long after wrapping up this work, the colleague who suggested I try Python, asked if he could use this work on another project.  Absolutely!  I was delighted that the strategy proved successful and that the investment in new scripts could benefit other colleagues and clients going through a similar process.

Kellie
Problems of alignment

Electronic Frontier Foundation
Apr 2010 to Jul 2013

How do we make sure our technical tools support, and align ideologically, with the organization's mission?

“We have so many duplicate contact records, if I were to merge each one, the work would take over 80 years,” a colleague reported.  Clearly, that was not going to happen.  Nor would it really have served the organization’s mission: to defend the rights of technology users.  But we did want to respect someone’s deeply personal decision to donate and that meant not misunderstanding or misrepresenting their history with the organization.

I had been working at EFF for over a year by the time my colleague put this problem on my table.  When I arrived, no one on staff could say who our donors really were and what motivated them to give.  Out of curiosity and a desire to thank them personally for supporting our work, I had started inviting community members over to the office, to say hi at conferences, to happy hours, and to court hearings.  They were professionals, personally passionate and deeply engaged in the cause, often working on the technology and sometimes even on the cases that kept EFF busy.  All of them would have forgiven us if our fundraising team accidentally reached out twice in the same day.  But we could do better and we did.

I started by writing down all touchpoints that rely on data from our very unreliable donor database.  The list included newsletters, event invitations, and different kinds of fundraising related communications.  I also made sure the board relied on donation and donor numbers from our far more reliable accounting database.  So I could prioritize only those points where the duplicate issue might actually affect someone outside staff.  I also took time to hear from more technical staff on the usefulness of scripting the dedupe process.  The results of those brainstorms was: no.  It usually requires personal knowledge to know whether this donor is the same as that other donor record.

Instead of creating lists on the fly, we transitioned to persistent lists.  And before sending a communication, we began the arduous process of de-duplicating records.  But at least we knew we were investing these efforts in only those contacts who still wanted to hear from us.  And we reduced the scope of the problem to one that in practice, only affected a small portion of the records in our database.  The rest?  Let them sit quietly in the background.  It served the organization better for my team to enjoy a conversation with our community members about how they can help with an upcoming campaign than for us to spend 80 years de-duplicating stale records.

Kellie