It’s almost analysis paralysis with all the jargon, especially when misused and twisted by software vendors around what it means to do modelling and analytics within the packages they are selling.
The purpose of this blog is really to go a little deeper into predictive analytics and financial modelling.
Now, I might be a financial modelling expert, but I will be the first to say that I am not an expert in Predictive Analytics. ‘Why not?’ You’re probably asking, and that’s because it’s fundamentally a different skill set.
The software applications and languages/coding used at times are also significantly different from each other
So, to give this blog a bit more credibility and balance, I have decided to co-author this blog with Igor Panivko. Igor is the CFO if Konika Minolta and sits on the board of both the Ukraine and Russian entities.
He has impressively largely replaced the finance “human” team with automated BI and analytics capabilities, meaning that he is then able to capitalise on AI and ML to perform advanced predictive analytics.
I have already learnt a lot from Igor, we both sit on an AI/ML committee alongside Larysa Melnychuk and Irina Zhuravskaya (amongst a few others). I hope those reading this will all also learn something from this blog and more of it to come.
For this blog, we are excluding generic data analytics terms and associated systems, like PowerBI, Tableau, Qlik etcetera, as none of these contain any native modelling or more complex predictive capabilities.
Financial modelling (FM)
I believe it’s the overall skill of making better business decisions using well-constructed Excel model by applying business logic, accounting and problem-solving.
At the basic end this might include revenue or cost analysis and dashboards with simple formulas and at the more sophisticated end an integrated 3-way, structured references, advanced scenario manager and Monte Carlo simulation coupled with some VBA coding and advanced technology to enhance visual influencing dynamically and live.
Predictive analytics (PA)
Igor believes it’s a set of statistical tools designed for finding patterns in the behaviour of the targeted outcomes by finding dependencies between dataset predictors and the target.
The PA process is based on the search for optimal statistical algorithms which fit the best into the prediction of an outcome through either the highest accuracy rate for a classification problem (discrete outcome prediction) or the minimum error rate for a regression problem (prediction of continuous values of an outcome).
Predictive analytics is the same as a supervised machine learning and classic statistical learning, and any of these concepts can be used interchangeably.
Differences / Similarities
However, definitions don’t always make it clear, as some words can still be confusing. So, to make this even more comfortable to dissect, we have summarised this into key segments namely
Understanding of finance and economics of the key value drivers of performance.
Having an intuitive gut feel whether the projected results appear logically reasonable.
Having a healthy dose and professional scepticism on the projections. Irrespective of which is used, neither is 100% accurate, but some are pretty close.
PA has to deal with cleaned and well-prepared datasets whereas FM may be applied on tiny pieces of unconnected data or without data at all.
PA has a weighty reliance on statistics and mathematical calculations, including complex algorithms and coding. This is coupled with program languages like Python and R.
FM may require you to use more advanced systems and functions. For example, this might include writing or reading VBA and perhaps Index Match vs Lookups, offset, nested if statements, goal seek and rand between etcetera.
However, this isn’t an explicit requirement, unlike PA, which isn’t possible without complex mathematical algorithms and coding. I think that FM more relies on the intuition of a modeller rather than patterns derived from data.
Both approaches use software tools such as data frames: data with columns and rows.
In FM language, it’s a spreadsheet with either simulated or manually prepared data.
In PA language it’s either a matrix or a data frame as in Excel uploaded from another system, database or another Excel file.
Both rely heavily on essential software techniques as a data transformation, for example combining several tables, multiplying the whole table or one column (row) by either a number or another column, row or even table.
Both use cross-tabulation, like pivoting and unpivoting. All these functions are embedded in Excel and both R and Python programming environments.
PA deals with the programming environment rich with embedded statistical libraries. For example, the R statistical language has more than 10,000 libraries.
FM, for those true diehard fans and experts, will still primarily be Microsoft Excel. Yes, Excel on Apple, Google sheets and other knockoffs are just not up to scratch for experts. You certainly won’t see any of the Modeloff competitors using these other spreadsheet programs.
You also won’t be able to do any VBA automation, run Monte Carlo simulation and other sophisticated FM techniques. You may even struggle to find goal seek, which is incredibly useful for modelling.
Not a lot.
Highly complex statistical-based algorithms probably considered to be the highest form of complexity there is from a purely mathematical perspective. This is why this article is co-authored with an expert, like Igor.
The level of complexity is sometimes already beyond human understanding as there are specific PA techniques which work like black boxes and do not show how they reached the highest level of prediction.
Indeed, some formulas can get reasonable complex like deeply nested if statements or reverse engineered goal seeks or XMIRR calculations etcetera. For the most part, however, complexity can also be avoided by breaking out logic into clear IF statements which are reasonably simple to write and follow. VBA can also be entirely avoided by simply repeating a process with the F4 key.
Both require accurate data to provide any sense of reasonable insight to support/validate assumptions.
Most importantly would be actual data against which projections/forecasts can be compared to for confirming the accuracy of the predictions.
They both, therefore, need to be tweaked, updated, refined based on real live data, to gain increased confidence in the predictive outcomes. The data collected should have a reasonable level of controls to provide assurances of its accuracy, i.e. garbage in garbage out.
PA may work well both on small and big datasets; it depends on the skills of a modeller to choose the optimal algorithm. Sometimes, algorithms prepared on large datasets do not work better than smaller data sets that are well-prepared.
Many PA algorithms can work well on small datasets and show quite reliable results, but it’s not possible without any data.
Processing significant volumes of data maybe be beneficial but is not always necessary. Usually, the dataset preparation takes 70-80% of all the research time, and it may shrink the initial raw data ten-fold. Quite often, it is enough to run a test model on a subset of data before launching a powerful server for enormous calculations.
For analysis, PA modeller often uses mathematical transformation of the raw data, like rescaling, logarithmic transformation of numbers, turning words into numbers, etcetera.
For FM, it is possible and very often the case that almost no data is required to build a financial model purely based on someone’s professional judgement and business idea.
Many startups have great ideas but struggle to pull this succinctly into a story of numbers – sort of like paint by numbers. It is, therefore, quite common to build a financial model forecast with no historical data.
This is where the art vs the science of FM comes into play.
Both run better on higher performance machines and processors. Especially when the model contains multiple workbooks each 5-10mb (xlsb format) and all need to be opened at the same time, meaning that sometimes even 8gig of memory isn’t enough.
It is possible, albeit not advised, to build and run a financial model on a Pentium 1 machine with Windows XP. I kid you not; someone attended one of my courses with this setup. I politely explained coming to a future of financial modelling course with such an old setup isn’t going to be fun. It’s like turning up to a cricket game with a baseball bat. It’s possible to hit the ball, don’t know if it’s the best idea.
It is possible to run most of the PA algorithms on a single machine, like a standard laptop with 8-12 RAM. A dataset with size of 1-5 Gb can run on most of personal laptops. It is quite rare that such a vast volume of data needs to be analysed for a research purpose. If the information is clean, the required dataset can be downsized ten-fold and successfully run on a standard laptop.
In some cases, PA might need to process huge volumes (many gigabytes or terabytes) of data; then cloud solutions can be used.
Cloud-based solutions for Excel-based modelling is rare. Some forecasting tools that use cloud, but these tools don’t provide the sort of complex financial modelling techniques like Monte Carlo simulation or multi-dimensional scenario management.
Although there are some substantial differences between FM and PA, both help us cope with uncertainties and make better decisions.
We can make financial models without data using our experience and business intuition and be very accurate with our forecasts. We can also make excellent models based on proven predictive algorithms built on various data sources.
It would be nice to have both approaches working and supplementing each other.
There is no competition between them, rather an opportunity not to rely heavily on only one way of decision-making but to combine and innovate.