Do we trust Excel with too much?
AuthorBad Dinosaur Team
Public Health England’s decision to use Microsoft Excel to process NHS Test and Trace data caused nearly 16,000 COVID-19 cases to go unreported. We take a look at how this error happened, and the drawbacks of relying on Microsoft Excel as a database (when it's not really supposed to be).
This isn't the first Excel gaffe that's hit the headlines (London Whale case, Thomas Piketty case, Reinhart-Rogoff case) and it’s unlikely that it’ll be the last. The story usually starts with outrage, then confusion, a reckoning of some sort, and then a sigh of relief (that it wasn't us). Maybe it’s time to ask ourselves if we trust Excel with too much.
So what happened? The private companies analysing coronavirus swabs gave their results to Public Health England, who pulled the results together into an old format of Excel (normally it can handle a million rows, this one can only do around 65,000 rows) and passed it on to the Test and Trace team. There wasn’t enough space for all the results, so over the course of eight days around 16,000 cases were not reported, and their 50,000 contacts were not immediately alerted. For a more detailed breakdown have a look at this article from the BBC.
The error wasn't caused by an inherent flaw in Excel, it was as a result of the application of Excel. Excel is brilliant, it's really useful and has heaps of functionality, but it's ultimately a spreadsheet. It's very good at analysing lots of data, allowing us to draw comparisons and discover patterns. What it's not designed for is securely storing vast quantities of critical information, but that doesn't stop us trying. And while it is possible to create something akin to a relational database with excel, most users don’t think like software developers. So we end up with a multitude of two dimensional worksheets with very little insight into data correlation.
It's easy to see why Excel reliance is so prevalent in every industry: it's familiar, it's the default, it's accessible, intuitive, and it's not too intimidating. Even if we do have alternative tools with 'proper' database functionality, it's not uncommon to see people extract data from the back-end of those systems, plug it into Excel, and do their analysis there. Excel is the comfort blanket of business software, but here’s eight reasons not to use it as a database:
Size was the shortcoming in the case of Test and Trace. Excel is more than big enough for most use cases - you can have around a million rows and a thousand worksheets in a file. The limitation in processing capability becomes obvious once you start using a multitude of equations, and your laptop sounds like it’s about to take off when you try to save the file. Most modern database solutions are deployed in the cloud, which usually means they are elastic by nature and can be scaled up and down to fit the requirements of traffic and data size. Depending on the type of database used, row limits are usually a problem of the past.
There's no way to trace where your data came from, it's totally un-auditable. With free-typing, your cat could walk over the keyboard while you're putting the kettle on, overtyping your precious formula. You might never know until it's too late. In a typical software application, audit trails are easy to implement and can easily be tied to granular user actions, usually with a handy timestamp for context.
It's not really designed for more than one person to work on at any one time. This functionality is becoming more available but it still doesn't reconcile the issue of tracking changes made by other users. So, we still tend to share files and end up with Frankenstein's Monster: Q4Report_finalFINAL (martin comments)_FINAL_(clean).xlsx. In contrast, most software is designed to allow asynchronous access to the database, which means lots of users can access and update data at the same time.
Short of referencing yourself to oblivion there's no easy way to separate data and presentation. This means we end up with highly stylised sheets that management love (because they are easier to read), but they're not very robust - if you want to add or change anything you need to be really careful. That's not sustainable.
There's no real way to test if your spreadsheet is working. Every formula is hidden neatly behind hundreds of little cells. You can't expect each and every one to be reliable if there's no easy way for you to test them - let alone a way for someone else to test them.
Nearly everyone proclaims that they are "proficient in all Microsoft applications" on their CV, but in reality we fumble along, pick things up from colleagues, consult with Google if needs be, and ultimately fake it 'till we make it. One of the best things about Excel is that anyone can use it, but this also means that anyone might. With purpose-built software you can create different role permissions, giving the appropriate level of access (and support) to the right people.
Excel files can easily grow from being someone's handy personal workbook to critical company record. How many spreadsheets have you come across that were cobbled together years ago by some well-intentioned temp, who failed to log the logic behind it methodically or future-proof it for scale? By the time these spreadsheets outgrow their makers it's either too late, too cumbersome, too risky, or too expensive to pivot (sorry) to another platform. Technical scalability is often bottlenecked by a reliance on spreadsheets and the flexible nature of tailoring software to your business needs offers respite from this. Developing software through an agile methodology helps your digital requirements grow with your business requirements.
Excel files aren't backed up. Enterprise-wide adoption of cloud-based storage is helping us get better at this, but that's not going to stop instances of colleagues updating the OnlyCopy.xlsx and saving it to their desktop, only to delete it, spill coffee on it, or leave it on a train platform. Modern databases are cloud-based, so you don’t have to worry about saving the latest version.
Excel is brilliant, but there's a time and a place (and a user) for it. This unfortunate story has brought the limitations of the system into the spotlight. It's a timely reminder to check in on your data governance practices, check access and permissions rights, check if training is up to date. Crucially, it’s time to check if you are asking too much of it. It might be a good time to investigate whether Excel is the right tool for your use case, and what alternatives are out there.
If you think you need something bespoke to manage your company data, get in touch with us, that's the kind of thing we can do. We pride ourselves on being honest and direct with potential clients, and if there is a better solution somewhere else out there for you then we won’t hesitate to direct you to it.