Expertise
In-house Machine Learning and the journey towards better performance
12 minutes
Are in-house Machine Learning projects better than on-shelf? The answer is obvious: as with most things, it is preferable to have something custom-made rather than a generic solution. But why exactly? Why is there so much supply and demand for on-shelf pipelines and models? What are the pitfalls of on-shelf, and how and where is the added value of in-house development?
First, a few clarifications. Machine Learning is a field of Artificial Intelligence which studies and develop methods that learn from data. When I say Machine Learning project, I refer to the complete process of defining a problem, selecting the relevant model type, extracting and creating features, training and evaluating the models, and so on. A finished project should result in either a report with insights or a pipeline. A pipeline can run on production and streamlines the process so your models can be repeatedly and smoothly used in your business. The intelligence of a report or pipeline comes from the models. A Machine Learning model will train on data and either extract patterns or learn parameters (often called weights) that it can use to predict on new data. In order to learn weight, a model will try to satisfy an objective, which is written as a formula and is often called theloss function.
The fairy tale of Machine Learning and the untold story
If you don’t live under a rock, you might have noticed there is currently hype around Machine Learning. It’s a magical word that will make your business seem more sophisticated and efficient, improving your brand image. At the same time, Machine Learning promises faster speed or better targeting, leading to greater returns.
As a response, some businesses see all the freely available materials on the internet (models already trained or easy-use libraries), and think they can build a small data science team and quickly get results with minimal effort. In parallel, thousands of companies have cropped up to meet the rest of the demand. They can supply data platforms to help you quickly train and deploy Machine Learning models; can offer their services to carry out your project; can give you access to their platform so you can use their algorithms.
But these “solutions” rely on a fairy tale: that Machine Learning can be made “easy”. The emphasis is on rapidity and convenience. They want to convince you (and themselves) that data preprocessing is always the same, that a model architecture can fit all problems, and that a model can be trained with minimal human intervention.
And, to a point, they are right. Those pipelines might give you satisfactory results. But only if your data is already clean and structured, if your information has a significant random component, if the problem you are trying to solve is classic, or if you content yourself with “good enough”. This is not the case at Lucky cart, nor for any company serious about exerting their profession through Machine Learning. We understand that, in order to find the best outcome, the best pipeline, we must first imbarc on a journey.
What is an on-shelf project ? Understanding the map legend
Now, what do we mean by an on-shelf project? As I hinted before, an on-shelf project is an initiative in which Machine Learning is made “easy”. The most extreme examples are the so-called Machine Learning engines. These platforms allow you to upload your data, maybe select some preprocessing steps, choose a model from a predetermined list, and train and evaluate the model. However, an inexperienced, rushed, or unmotivated data scientist can also produce a very on-shelf pipeline.
Furthermore, as with everything, there are degrees. For example, using a pretrained model is different from retraining a known model architecture, which is also different from designing your own. You might also find yourself with a pipeline with in-house feature engineering but on-shelf model, or any other combination. I will talk a bit about the different stages of a machine learning project, and the pitfalls of on-shelf thinking in each of them.
Project definition: the first step that will define the journey
The issues with on-shelf models start even before the project begins development: with the definition of the objective. There are several classical Machine Learning problems and algorithms (clustering, classification, prediction, segmentation). Projects that want to use on-shelf models tend to fit the problem to the model, rather than the other way around. So the question becomes: where can I use a prediction model (for example) for my business? When it should be: this is what my client needs; how can I use Machine Learning to help me get there?
This framing is much more flexible. You might find that several completely different approaches exist, each with various cons and pros, or that a combination or chaining of models would work best, or that you can adapt an algorithm not generally used for your use-case. You will never know if you have chosen the right approach if you never even asked the question.
Feature engineering : where in-house is the only viable path
Some would argue that feature engineering is the most critical step in Machine Learning. It transforms the raw data into meaningful and comprehensible inputs for our models. A model needs multiple inputs (called features) and they must, collectively, contain enough information for the model to identify valuable patterns or predict behaviours. It is better to have an on-shelf model with good features, than a complex or tailored model with irrelevant features from which it will learn nothing.
At its worst, on-shelf feature engineering will only consist of filling missing values with the average, removing highly correlated variables, and normalising the inputs. Already, we have a problem because most raw data can’t even be used as features. This approach will only begin to work if you have a static subject (text, images, objects) or someone has already created some features for you. By this I mean that, oftentimes, the subject evolves through time and your data consist of multiple “snapshots” per subject. These “snapshots” must be aggregated or grouped in order to construct meaningful, useful features. For example: one transaction is not enough to characterise a shopper’s behaviour.
Furthermore, to do outstanding feature engineering, it is imperative to have both business expertise and a good understanding of the project’s goal. This will help you determine which information would be interesting for the model to have and how to combine existing features. It should guide you in identifying different data sources you can use. It might also assist you to determine the best granularity to work with. Moreover, it will tell you if it’s important to normalise, attenuate or highlight certain aspects of the data. For example, if we want to group shoppers by their purchase preferences, it’s important to normalise by cart size. Otherwise, you might only find the obvious and unhelpful “big” and “small” consumer groups. Overall, it’s clear that it’s a process that is both industry- and project-dependent, and that can’t be done on-shelf.
Models: using wisdom to pass through the forest
Unlike the sections above, I’m not going to try and say that on-shelf models are bad, we all know they are not. There are some outstanding trained models online, that were trained with thousands of examples. If they fit your needs, then you are probably better off just using them as-is or doing light retraining, than trying to train them from scratch. There are also superb models with a solid mathematical basis, clean and well structured implementations for these same models, and specific model architectures that have proven themselves in many fields. These models usually give excellent results after some hyperparameter tuning. Even better, a data scientist can try multiple well-established models, evaluate their performance and choose the best one.
The key consideration, however, is that a good data scientist should never use a model they don’t understand. For one, they might try models that are not adapted to your data or problem. They will also blindly change hyperparameters without understanding what or why something might work. They might choose a model that seems to have a good score, without correctly checking why and if there are biases (the story of the excellent image classification that differentiated between dogs and wolves by the presence of snow is well known). Conversely, they might quickly give up on a good model, when a slight change in the input or a seldom-used hyperparameter would have fixed it. This is without even getting into the model evaluation and how they should be tailored to the project, and how an unsuited evaluation method will lead to choosing a model that doesn’t satisfy your needs.
At the same time, understanding how or why a model doesn’t work is imperative to propose a better approach. This proposition could simply be a hyperparameter change, or another on-shelf model. It can also be, if necessary, a change to the model so it can better handle your data distribution. These changes might concretize in a change of loss function, adding some regularisation, or adapting the optimization algorithm.
Until now, I’m also assuming that a classic model serves your project’s purpose. And that, at most, we must tune or adapt it to your data distribution. However, it might be that the model doesn’t address your problem. In this case, an ad-hoc loss function and/or layers must be implemented, creating a real in-house model. To name a few examples we have come across at Lucky cart:
- You might need to add industry-specific knowledge and constraints into your model. For example, if the model predicts no purchase, the predicted amount should also be 0.
- You might need to isolate and measure the effect of a specific input. Can this purchase be attributed to normal behaviour, or the promotion? Would a different promotion have another effect?
- The final end-use of your model might demand tighter or looser requirements than the original problem for which the model was created. For a specific project, we predicted future purchases, but were more interested in correctly sorting the consumer, rather than predicting the amount itself.
- You might also seek to use a predictive model to learn parameters, rather than be interested in the output itself. This is how we measure elasticity (a measurement that reflects an individual’s susceptibility to a price change).
Even more than before, an in-house model needs an experienced and competent data scientist. Worse yet, an unskilled data scientist might try to adapt a model or methodology, not understanding the mathematical principles behind it, and result in a pipeline that is not consistent in its hypotheses.
Finally, it’s worth noting that pipelines don’t always consist of only one model. More complex systems can be constructed where different models run in sequence or parallel and where their inputs and outputs interact with each other. Again, each of these models can be on-shelf or in-house.
After production: never stop walking
The job of a data scientist isn’t done after the pipeline is in production. Periodic model retraining is important in maintaining good performance. Credit where credit is due, some on-shelf platforms offer easy retraining for their models. However, as time passes, other projects might give rise to new insights, ideas or features that can be used to improve existing pipelines. Furthermore, as your product evolves, your needs also change and the pipelines need to be adapted.
In parallel, it’s also important to measure the impact the pipeline has on your business. For Lucky cart, we want to establish if our personalised promotions perform better than random or uniform promotions. Likewise, each time we propose a change to our systems, we will first A/B test it to ensure it is actually an improvement. On top of this, data science pipelines do not exist in isolation. They are often part of larger computation processes, such as simulations, financial optimization and so on, and it is important to understand and monitor the impact the pipeline will have on the rest of the system.
In-house Machine Learning: the journey determines the destination
Machine Learning isn’t easy, and it’s not a good idea to pretend otherwise. The hype of Machine Learning has led many companies to try to implement Machine Learning pipelines. Still, not everyone is willing or able to commit to and face the difficulties of this endeavour. A rushed, on-shelf project can quickly lead you to a functional pipeline, but nothing guarantees that this software is suitable or optimal for your needs. Rather than using generic solutions, each line of code must be chosen carefully. Furthermore, these choices must be guided by both a deep understanding of the context (your product needs, data, and so on) and technical knowledge about Machine Learning.
Here at Lucky cart, we are convinced about the power of in-house Machine Learning. This is why the DATA team comprises a significant portion of our company, and we constantly recruit new Data Scientists. We have many active Machine Learning projects, have completed various reports, and have developed pipelines for, among others, fraud detection, predicting a shopper’s cart content, and promotional optimization.
All things considered, why would anyone do business with you if your product is easy to implement and reproduce?