The mismanagement of data, bad algorithms and poor visualisations have been in the news over the past month – here are the lessons you can take away.
The last month has not been a good one, PR-wise, for UK digital ability. Most notably, it has highlighted what not to do when it comes to three of the core components of the digital finance function model: data analytics, data visualisations and machine learning. Here are some of the lessons we should take away.
Data Analytics: 16,000 missing COVID records
Whether you blame the user (from a well-paid consultancy), the tool or the leadership, this headline-grabbing drama occurred due to the transfer of large amounts of data into Excel using csv files. Something that the BBC described as an “old file type”. The issue here is that Excel was being used as a database – which it isn’t. The process of transferring data from one place to another was also entirely manual (I assume without checks or controls), which led to 16,000 records being removed from the import file.
A small number given the total number of records, but the impact on decision making was clear. Reported figures in the North of England appeared to be flattening, but once the missing data had been added, it showed that cases were in fact rising fast. The decisions being made while that data was missing would have been ineffectual.
Apply this to a revenue forecast or supply chain forecast that may drive decisions around purchasing (working capital impact) even headcount (people lives impacted) – we can see what happens without the right skills and governance, or the right tools and solutions.
Machine Learning: the exam results fiasco
We live with machine learning algorithms every day, whether it’s a google search or an Amazon recommendation, it’s ubiquitous. With the UK exam results prediction algorithm, the stakes were much higher than a bad film or a returned purchase. Inherent biases in that program affected the course of student’s lives.
Two issues drove the backlash against it. Firstly, the performance of the algorithm came into question very quickly with concerns that downgrading results based on historical data in a particular postcode, penalised the best students in those areas – in many cases BAME students. Secondly, the teachers did not buy into the idea of the algorithm making the decision at all. Once challenged, the movement against it grew, fast.
In my humble opinion, one significant step was missed by the examination bodies: getting buy-in! That includes explaining how the ML works, what components are used and most relevant, and giving people the opportunity to intervene on edge cases and outliers. ML and human beings together, not one replacing the other.
In business, if we are to introduce algorithms to make decisions, we need to get much closer to what this new world is and how we embed it into our teams successfully.
DEEP DIVE LIVE: HOW DO YOU MAKE ‘BEST-IN-CLASS’ REPORTING A REALITY?
If you are looking to improve your reporting output and capability, this session is for you.
Thursday 22 October, 2pm BST
Data Visualisation: the “it’s bad, but it’s not as bad as it looks” chart
Some may say data visualisation is not as important as the information and insight you are communicating, but I disagree. After hours of a data science project, or late nights over quarter-end, you don’t want to lose your audience by presenting the data ineffectively. Even more, you don’t want to mislead them.
One of the main charts that the BBC uses every broadcast to report on the state of the pandemic, is the number of confirmed cases by date reported. The chart shows a massive spike in recent weeks, and the viewer may think that what is happening now is worse than the first wave of COVID19.
What is misleading about this chart is threefold. Firstly, the addition of the lost 16,000 (a week’s worth of data) into two days, has skewed the seven-day average, which many people look at to judge if cases are going up or down. Secondly, the amount of reported cases depends on the number of tests carried out. So if we test twice as many people in October than August, there will naturally be more cases. Finally, the chart includes the pre-mass testing data, which could be misleading as this data was collected on a totally different basis.
As a result, it’s not clear what this chart is trying to tell us. The seven-day average is up, perhaps, but we know the data is skewed. It shows us confirmed cases on any given day, so just report that as one figure with the corrected seven-day average, as the time series makes no sense to the reader.
Compare this to our month-end paks, and lines and lines of variance analysis – you can see how important it is to highlight the key messages and consider the value and action to be taken from each. Hard to define messages have no value to your reader.
We can all learn from the above when bringing in digital finance function capabilities, and why it is so important to have a strategy to build a data asset and an effective team.