Soleadify Spotlight takes a closer look at the key topics and influencers that shape the world of data and business. Soleadify is a new generation b2b data company that produces high-performing solutions for financial services, insurance, capital markets and consulting clientele. Our focus is on helping to drive better decisions, improve business processes and deepen insights. We shorten the path from data to value. Find out more about us here.
In this episode, we had the pleasure of speaking with Roger Vandomme, Chief Data Scientist and Principal, at SMC.
Roger’s career has been built on the fundamentals of data analysis, predictive modeling and related decision-making, and has included senior roles at Dun & Bradstreet, Equifax and Lendified. Roger has an outstanding unmatched skill-set in the field of predictive modeling and has completed numerous studies and research on decision heuristics and biases, developing reasoning methods and processes around systemic design and game theory. Roger holds a Master’s degree in Applied Mathematics from Paris University, a MBA from Queen’s University, and a Master in Defense Studies with the Royal Military College.
Roger, we are thrilled to have you join the inaugural edition of Soleadify Spotlight, where we connect with people who are shaping the world of big data. Today, we’d love to tap into your many decades of experience in the small business (SMB) lending and credit risk world.
I am delighted to join you!
SMB underwriting is undergoing a transformation: how would you describe the overall state that it’s in now?
SMB underwriting has always had operational challenges. The root cause of this issue is the lack of available data about SMBs. When making a decision, loan officers must assess their risk. They must estimate the probability to be reimbursed or, at the opposite end of the risk spectrum, the probability that the business will fail at fulfilling its engagements. To do so, they need descriptive and historical data about the business and its owner.
However, the traditional lack of such data logically increases the banks’ conservatism. The fact that private companies are not legally obliged to provide their financial results certainly doesn’t help.
Navigating uncertainty to inform decisions is the favorite playground for mathematical predictive modelling, and its most recent iteration, which utilizes artificial intelligence (AI).
Since the late 1980’s, banks and credit bureaus have developed payment indexes, based on past payment experiences. Then mathematical modelling ventured into predictions and probabilities through credit scoring, with some success but many limitations; essentially and again, because of poor data.
For the last 10 years we have been witnessing a revolution known as “Big Data”. That was mainly a hardware revolution that exponentially increased the capacity to store data in extremely large volumes and process it at a very high speed. This technical revolution allowed for mathematical computations, known in the past, but only then possible. That led to the rise of AI.
SMB underwriting is directly benefiting from this revolution. With more data, in volume and variety, and more sophisticated ways to process it, the lending decision makers should feel more and more confident.
The banking industry, being conservative as we know it, has been slow in adopting those new tools and methods. Younger, smarter, agile, reactive, and most importantly unregulated players, known as the “fintech”, quickly conquered this space all around the planet. The regulators’ decisions and the banks’ will to adapt will shape the future of SMB lending.
Despite the advancements made in analytics tools and capabilities, what are the most significant gaps or barriers that remain?
The most significant barriers are both cultural and technical.
By cultural, defined by the lack of knowledge and occasional total ignorance, I am referring to the matter of artificial intelligence and data analytics in general. According to the principle that what you don’t know scares you, we are still witnessing a large amount of distrust and pushback, which, most of the time, is unreasonable and irrational. Those barriers will disappear when the generations to come, educated in a digital world, will take over.
And by technical, which also has cultural overtones, I mean the reluctance of market regulators to allow the use of neural networks (NN) in credit decision making. NNs are complex algorithms still known as the “black box”. We input data and output a decision, but we cannot know for sure the “why” of the decision. We know how an NN works. It is just that the variables and weights can be so numerous and complex (sometimes several millions) that it is impossible to explain THAT decision. The regulator considers, rightfully I believe, that we ethically cannot allow such an important decision, impacting humans’ lives, without being able to provide an explanation. Hence, the industry is, for now, deprived from using the most sophisticated and efficient mathematical tools.
If an SMB lender hasn’t fully embraced digital transformation, are there some steps that they can take in 2022 to accelerate forward?
If an SMB lender has not yet fully embraced digital transformation, it is most probably because of lack of knowledge and distrust. To make a good, educated decision, I would advise less mature organizations to invest in education and help from technical advisors. It is not easy to progress in uncertain, even possibly hostile grounds. You need help and guidance.
In the meantime, however, it is possible to save time and get ahead of the curve by working on data in a broad context. Regardless of whatever technical and operational solution that will be implemented in the future, data will be the common denominator that powers it. So, it will always be beneficial to start, as early as possible, to work on expanding and improving your internal and external data collection and quality. Nurture and clean the inhouse data. Look for alternate sources of data.
You will also need to recognize that digital transformation, and the inclusion of AI and analytically-driven processes, will accelerate and perpetuate change within the company. I couldn’t advise enough to anticipate change management. Win their hearts and their minds. Trying to impose such drastic changes without explanations is a recipe for disaster.
As a data scientist with decades of experience, how do you see the role of data, first and third party, evolving in the near future?
As previously advocated, data is the key, the common denominator, to all analytical approaches; and it is too often neglected. I unfortunately have too many “data horror stories” to tell. Most of the time it is because data is collected without knowing or thinking about how it will be used. Data scientists and analysts are not yet consulted enough in the data collection process. From my experience, the quality of the data, the way it is collected and the way it is maintained, is directly correlated with the quality of the decisions inferred from the analysis.
With the Big Data revolution, we witnessed an explosion in data volume and variety, for the pleasure and excitement of data analysts. Companies now realize that not only their internal data, but also many external sources, open or not, would add tremendous quality and value in their decision processes.
I will give you an example. For decades, the size of the database was the main barrier to entry in the credit bureau industry. How could you pretend to compete against a database, manually and meticulously compiled for more than 150 years? Well, today, a good data scraper could build an equivalent database in no more than several months, just by harvesting web content.
We agree, Roger!
We often see organizations under-estimate the value of verifying and validating existing datasets, particularly in SMB underwriting where freshness and accuracy is so important. How do you view this tradeoff?
I agree that we sometimes could be tempted to give up when confronted with a really nasty database. It is however often rooted in ignorance and laziness. It is just that we do not know where to start and the mountain seems too high to climb. However, it is very rare that a database would be hopeless.
Data architects developed recently a range of automated tools to tackle the cleaning and normalization of large databases. By experience, there is always something to extract from it.
Of course, it doesn’t prevent us from investing in new and different sources of data. In SMB underwriting, there has been an ignorance of the value of historical data. The focus on immediate fresh data was masking the fact that only in historical data can you find signs of trends, and hence the ability to predict. Both are important.
What is the biggest mistake you see underwriters make when it comes to leveraging data: too much data and/or too many sources, not enough data, or not enough quality data?
Things evolved a lot in the last 10 years. The raise of Big Data and the democratization of AI increased awareness and sensitivity about data.
Prior to that, I saw a large company erase an entire year of archived data every five years. Erased. Disappeared. Gone forever. The reason was that storage was too expensive. So I went to Best Buy and bought them a 2 terabyte hard disk drive for $100, enough to store more than 10 years of data.
There was no real awareness of what was possible with data. But things changed. Now we collect and keep data more easily. So there is never too much data. The more the better. Always. If you don’t know how to use it today, tomorrow someone will. Variety of sources is not an issue either, as long as we ensure consistency. Not enough data however is a problem and too often met. Sometimes it is just not possible. So be it. Sometimes it is because of false excuses, too complicated, too expensive, etc.
And finally, the worst mistake is neglecting data quality. That is a real waste of time and money. Collecting poor data is useless. It should be a push from the top executives to create awareness at all levels of the company to ensure data quality. Manual data entry is the most important source of errors. Simple verifications, on a regular basis, hunting for missing data or outliers, would provide considerable improvement. Data quality should be a strategic concern.
We’re seeing significant growth of SMB’s across the emerging world (MENA, BRIC). Do you expect business activity in these areas to continue to grow over the next 12-24 months?
Short answer: yes, most certainly and more than ever. An economist would be better than I to explain it.
However, I see many reasons for that trend.
Entrepreneurship is often a way out of unemployment. The pandemic crisis, that the world has known for more than two years, has been a major economical disruptor. We saw many SMBs obliged to cease business, and large corporations becoming even larger and stronger through aggregation. This fracture will probably encourage many people to start their own business.
And, to use one excellent example, I can see the investment that an organization such as the MasterCard Foundation is making in Africa, the lenders are facilitating access to capital through technology. Then we can only see an increase in this trend outside of the G20 countries.
How has COVID impacted the role of AI in monitoring private companies?
The COVID pandemic management has several consequences on the economy and on businesses.
One that everyone shared, is the shift to remote activities, especially school and office work. That was only possible with the help and support of technology. Tech providers took advantage to invest heavily in R&D and offered to the public and to the businesses a large panel of tools intending to assist in this new environment. Many of those tools are AI powered. They would automate some admin tasks, streamline paperwork, and optimize calendars. All those tools are generating a gigantic amount of data, more or less protected. Access to this data, sometimes as simply as through smart phones applications, can provide much information of business behaviour, management and choices. Simple BI dashboards, or complex AI decision support systems flourished in this environment.
In terms of near-term impact on SMB lending (and related business applications and/or use cases), what are the most exciting trends in regard to AI, ML or NLP?
I see two major opportunities in the near future.
The first one is very technical. I referred previously to the field of explanatory neural networks (XNN). When research will release effective solutions and results on NNs finally being explainable, then financial institutions will be able to use them. Accuracy will increase, so will confidence of lenders, for the benefits of SMBs.
The second one is related to data. The scope of available data is expanding exponentially. Websites, social media, and connected objects are increasing sources of data. As data becomes increasingly correlated to the capacity of making decisions, and the quality of those decisions, I expect increased access to data to have a major impact on the economy, and on the SMBs’ ability to access capital.
It all sounds good. I would however call everyone’s attention to ethical issues related to data. Freedom and privacy are at stake, and we should always consider it twice before moving on new technical horizons.
Roger, this has been a very insightful discussion! We appreciate your willingness to share expertise on the world of SMB lending and big data. Best wishes for a prosperous and data-driven 2022!
It has been my pleasure!