Data science is something of a buzzword in the world of data. But what does it mean to someone from an accounting background?
When you take your first steps in analytics you may not need to know too much about data science. But as you progress to more advanced or ambitious projects, it helps to have an understanding.
The term data science creates excitement and confusion in roughly equal amounts – excitement because of the potential it promises to unleash. And confusion due to the fact there are hardly any terms with standard definitions.
What is a data scientist?
Data scientists may require many skills. A data scientist may be an artful coder. They may be someone who can manage and wrangle lots of data and systems to start using the data they obtain (almost like a hacker!). Then they need to be solid mathematicians and statisticians to use methods to understand the data they have managed to obtain with their technical skills.
It is rare to find statisticians with coding skills, and vice versa. So in reality, the data science side of a project may be a team of people with varying strengths, rather than a single role.
The specialised work that gets done under the heading of data science includes machine learning, which helps fuel predictive analytics, using patterns and exceptions to predict future trends.
Machine learning methods (delivered by the whole data science team) are used to create accurate predictions about the data they have tamed, by creating and testing algorithms (elaborate statistical models) that produced new data i.e.: forecasts, that can inform the information consumer, “What is likely to happen next”
If the data is managed in a way that is can be regularly updated, it makes it a very valuable tool for forecasting and decision-making support.
A caution with machine learning predictions
Despite the power of predictive analytics, we must remember it produces predictions, not facts. Machine learning forecasts are only an estimate. Their accuracy depends highly on the data quality and the stability of the algorithm, which needs continuous optimization.
The Data Science lifecycle
The data science team will have their own concerns and preoccupations when setting up a new project. Here we will run through a lifecycle from their viewpoint.
The technicians will want to conceptualise what the business problem is and understand the data available for analysis.
When the team have agreed, the what, why and how, they build a hypothesis (to be proved or disproved) that will help to answer the business question.
It is important to take a holistic view of the problem and to think about the criteria for success at this very early stage – what does the business want, what would it value the most?
What data do we have?
There is usually a lot of work to do to get the data into shape so that it is usable by the data science team. From a technical point of view, this is when the project really gets going.
Huge amounts of data, often from multiple sources, in various formats, need to be aggregated and transformed and loaded in to your analytics platform.
It’s worth bearing in mind that good processes, lead to good data, so if you are thinking about running projects like these, start by making sure your processes and master data are of the highest quality.
For more information on Data Analytics, we recommend these articles.
What model do we need?
A data scientist will refer to a model as an algorithm (as there’s a different language in academia and business). An algorithm is a set of constructs and rules that are used to break down the big question or hypothesis, and is used to create the Analytics output.
Once the problem is understood, and the data is ready, the algorithm is ready for use. Which type of algorithm you use will depend on the nature of the question you are trying to answer.
This is led by the data scientists, but it is important to include all stakeholders when building the model to agree the methods and techniques that will be used to answer the business problem, as business context and knowledge is very important to help change and improve the algorithm and ensure it performs well, and creates a strong result.
Testing out the model
Once the algorithm is built, it is tested by running it using historical data. This allows the team to see how its predictions compare to past results. The algorithm’s performance is tested by splitting the existing data into two groups, one data set is used by the algorithm to create the predictions, then the second data set is used to see how close the prediction would have been.
If it is close, or closer than human forecasting efforts, and allows for a time-saving, it may be considered as a replacement source of forecasts within the planning process.
What did we find out? Share and test
After the algorithm is created the new data can be visualised, possibly by the domain expert or a specialised data visualisation developer, and the findings are shared for review.
It is likely that you are the person communicating the results to the business, so not only is this critical to the project success and you personally, but it is a critical part of developing interest in becoming a data-driven company using an analytics approach and team culture.
Rollout – applying findings in the business
Once the performance is approved, and the value in using it agreed, the algorithm can be used, deployed into the live, main, production system.
The data science project team will need to document the code, technical and functional specifications, data flow diagrams, data architecture models from the prototype environment and hand over to the ongoing support team.
Data science is still an emerging capability in business, and a common language is still being established, but the use of new talent and new tools and techniques to support forecasting and better decision making is something we should all be aware of, and use if there is a business need to do so.
This article was written by Chris Argent and was first published in AAT Comment. It forms the forth article in a series of five, access them all here:
Data Analytics Series
- Data analytics – 1 – here’s what you need to know to get started
- Data analytics – 2 – your role in an analytics-focused finance team
- Data analytics – 3 – visualisation techniques to bring your data to life
- Data analytics – 4 – how data science and machine learning fit in
- Data analytics – 5 – how to communicate your insights